Best budget GPU for Stable Diffusion training?

Question

Hey all — I’m trying to figure out the best *budget* GPU to buy specifically for Stable Diffusion training (not just generating images). I’ve been running SD locally for a while, but I want to start training small LoRAs and maybe some lightweight DreamBooth-style fine-tunes for a couple of personal projects. Right now I’m stuck using a mix of Google Colab and a pretty old card, and I’m tired of sessions timing out or VRAM errors killing runs halfway through.

My main question is where the “sweet spot” is for price vs. training usability. I’m not expecting miracles, but I also don’t want to buy something cheap that ends up being basically generation-only. From what I can tell, VRAM is the big limiter — I keep seeing people say 12GB is the minimum for comfortable LoRA training, and 16GB+ is way better if you don’t want to constantly fight batch sizes, resolution limits, or heavy gradient checkpointing.

Constraints: I’m aiming for around $300–$450 (could stretch a bit if it’s really worth it). I’m in the US and I’m okay buying used. My PSU is 650W and my case is mid-tower, so I’m trying to avoid super power-hungry cards. I’m also trying to keep this simple on the software side — I’ve heard NVIDIA/CUDA is still less painful for most SD training setups than AMD, but I don’t know if that’s still true.

Given that I want to train LoRAs reliably (and not wait forever per epoch), what’s the best budget GPU right now for Stable Diffusion training — and what VRAM amount should I treat as the practical minimum?

rgvlxrdfok · Accepted Answer

For your situation, I’d treat **12GB as the practical minimum** for LoRA training that doesn’t feel like whack-a-mole with OOM, and **16GB is the “sweet spot”** if you wanna do DreamBooth-ish stuff or push 768 without suffering.

- **Option A (best bang-for-buck used): NVIDIA GeForce RTX 4060 Ti 16GB**
- Pros: 16GB VRAM is honestly the whole game for training usability, power draw is chill (your 650W PSU is fine), CUDA setup is painless.
- Cons: raw speed isn’t amazing for the price, but it’s “reliable and not annoying,” which matters.

- **Option B (cheapest that still works): NVIDIA GeForce RTX 3060 12GB**
- Pros: usually the best used deal in the US, 12GB can train LoRAs at 512 pretty comfortably (at least thats what worked for me).
- Cons: 12GB hits a wall fast at higher res / bigger batches, and you’ll be leaning on grad checkpointing a lot.

- **Option C (avoid for training): NVIDIA GeForce RTX 4060 8GB / 8GB cards**
- Pros: fine for generation.
- Cons: training is… lmao… constant compromises.

Also: VRAM > everything, but you still want decent system RAM (32GB helps) and don’t cheap out on cooling. Anyway, I’d buy the 4060 Ti 16GB if you can snag it near the top of ur budget. gl!

yeznkdnklk · Answer

For your situation, I’d treat **12GB VRAM as the floor** for “reliable” LoRA training, and **16GB as the real sweet spot** if you dont wanna live in checkpointing/low-res hell. Best budget pick imo is a used NVIDIA GeForce RTX 4060 Ti 16GB (usually lands in your $350–$450 window) — power draw is chill for a 650W PSU, and CUDA support is basically pain-free.

If you wanna go cheaper, a used NVIDIA GeForce RTX 3060 12GB is still decent for LoRA at 512, just expect slower epochs and more OOM babysitting at 768/DB-ish stuff. And yeah, AMD can work, but ngl NVIDIA is still the “it just works” path for training. gl!

vssulrprkg · Answer

yo, I feel u — I’m happiest with an NVIDIA card w/ *at least* 12GB VRAM for LoRA; 16GB+ is way less annoying. Quick q: are you training at 512 or trying 768/1024, and do you care about speed vs just not OOM’ing?