I’m trying to figure out why my AI workloads aren’t scaling the way I expected after a GPU upgrade. How much does memory bandwidth (and things like VRAM speed, bus width, and PCIe bandwidth) actually impact training/inference performance compared to raw compute? Are there common bandwidth bottlenecks to watch for, and how can I tell if I’m hitting one?
In my experience, yep—bandwidth can totally be the reason scaling feels “stuck,” even after a beefier GPU.
- If GPU util is low but VRAM util/mem controller is pegged, you’re bandwidth-bound (common on big batch/attention stuff), right?
- PCIe bites you when you’re feeding data slow (small batches, lots of host↔device copies). Watch H2D/D2H time in your profiler.
- Easiest tell: profile and see if kernels are “memory bound” + try bumping batch size / fusion… if it barely speeds up, yeah, bottleneck.
i feel u, been there… what framework are you on?
Works great for me
bump