I have been playing around with cloud-based AI services for a while now, but I am really itching to move everything to a local setup. Between the privacy concerns and the ongoing monthly subscription costs, it just feels like the right time to invest in some solid hardware. My main goal is to run larger models like Llama 3 or Mistral smoothly, but I also want to dive into some actual AI training, specifically experimenting with LoRA and fine-tuning.
I am a bit torn on where the sweet spot is for performance right now. I know that VRAM is absolutely king when it comes to LLMs, but I am struggling to decide if I should hunt for a used RTX 3090 to get that 24GB of memory on a budget, or if it is worth the massive price jump for a 4090 to get the better architecture and speed. I have also seen some people suggesting dual-GPU setups, but I am worried about the complexity and power draw of a multi-card rig in a standard mid-tower case.
Has anyone here benchmarked these for actual training workflows recently? I am specifically curious about the real-world difference in tokens per second and how much the extra memory bandwidth actually matters for daily use. What would you recommend as the best value workhorse GPU for someone getting serious about local AI right now?
I would suggest going for a used NVIDIA GeForce RTX 3090 24GB. It is basically the budget king for LLMs right now!
Seconding the recommendation above! Had a moment to think about this more and I totally get the struggle between the 3090 and 4090—VRAM is literally everything for Llama 3. The 4090 is amazing, but the price-to-performance ratio is kinda wild for a first local setup. So, here are some thoughts on the value route:
Finally someone says it. Ive been thinking this for a while but wasnt sure.