How much VRAM do I need for LoRA training?

Question

I’m trying to train a LoRA locally (not full fine-tuning) and I’m confused about how much VRAM I realistically need to avoid constant OOM errors. I’m looking at 7B vs 13B base models, and I’d like to train on 512–1024 token sequences with a small batch size (I can use gradient accumulation if needed). I also want to use mixed precision and maybe 4-bit/8-bit loading, but I’m not sure how much that actually saves during training. For stable LoRA training (not just barely running), what VRAM should I target for 7B and 13B, and what settings impact VRAM the most?

knjidsjrmy · Accepted Answer

Watch out for the “it loads in 4-bit so I’m safe” trap — you’ll still OOM from activations + optimizer states once you crank seq length, batch, and LoRA rank. Been there, same mood lol.

For your situation, I’d target ~12–16GB VRAM for comfy 7B LoRA at 512–1024 tokens (mixed precision, small micro-batch + grad accum). You can sometimes squeak by on ~8–10GB if you keep batch=1, lower rank, use checkpointing, but it’s kinda misery. For 13B, honestly 20–24GB feels “stable” for 1024 tokens; ~16GB can work at 512 tokens with aggressive tricks.

Biggest VRAM knobs: sequence length (huge), micro-batch size, gradient checkpointing (saves a ton), LoRA rank/target modules, and whether you use an 8-bit optimizer. Good luck, cheers

yeznkdnklk · Answer

Been using this for years, no complaints

Astral90 · Answer

Basically, everyone here is spot on about activations being the real bottleneck rather than just the model weights. If you look at the math, a 7B model in 4-bit (QLoRA) only takes about 5GB for weights, but your optimizer states and gradients for even a modest rank LoRA will eat several more GBs. The real VRAM killer is the KV cache and activations at 1024 sequence length. If you go with something like the NVIDIA GeForce RTX 4090 24GB, you get access to Flash Attention 2 and Xformers, which are way more memory efficient for those longer sequences compared to older tech. Comparing brands, while the AMD Radeon RX 7900 XTX 24GB has the raw VRAM for a better price, the software ecosystem for bitsandbytes and PEFT is just much more mature on the NVIDIA side. If you are serious about 13B models, dont settle for less than 24GB. Youll still need gradient checkpointing turned on to keep from hitting OOM, but it makes the process actually usable instead of a constant headache. Tbh, 24GB is the sweet spot for 7B/13B LoRAs right now.