I’m trying to choose a GPU mainly for AI work (local LLM inference + occasional training small models), and I’m confused about which specs/features actually move the needle. Everyone talks about “more VRAM,” but how much does VRAM size vs bandwidth matter in practice? Also, are tensor cores/FP16/BF16 support the big differentiator, or is raw CUDA core count and clock speed still important? I’m on a budget and don’t want to overpay for gaming-focused features I won’t use. What GPU features should I prioritize first for real-world AI performance, and why?
+1 to VRAM-first — if it doesn’t fit, you’re cooked. After that, bandwidth matters more than clocks, and tensor/BF16 support is HUGE; CUDA cores are secondary. Budget picks: NVIDIA GeForce RTX 3090 24GB or NVIDIA GeForce RTX 3060 12GB.
> +1 to VRAM-first — if it doesn’t fit, you’re cooked.
This^ honestly. I’ll add: for *inference*, bandwidth + cache matter a ton once the model fits, but for *training* you also care about interconnect (PCIe gen, and NVLink if you go multi-GPU… usually not worth it on a budget). Tensor cores/BF16 are the real “AI tax” you *do* want; raw CUDA clocks are like… nice-to-have. In my experience, used NVIDIA GeForce RTX 3090 24GB is still the sweet spot, just budget for a beefy PSU and cooling cuz it can be sketchy in small cases lol
Same setup here, love it
Hey, been there… I bought a “fast” GPU once and still got VRAM-capped constantly lol. For AI, priority order (imo):
- VRAM size first: if the model doesnt fit, you’re done. Used NVIDIA GeForce RTX 3090 24GB is still a killer value.
- Then VRAM bandwidth: big for LLM inference speed.
- Then tensor/BF16/FP16: huge for training + faster matmul.
CUDA cores/clock matter, but only after the above. (at least thats what worked for me) good luck tho