VRAM Requirements

VRAM Requirements

Understanding VRAM needs is essential for running LLMs locally.

Estimation Formula

VRAM (GB) ≈ Parameters (B) × Bytes per Parameter × 1.2

Where:

  • FP16: 2 bytes/param
  • INT8: 1 byte/param
  • 4-bit: 0.5 bytes/param
  • 1.2x: Overhead for KV cache, activations

Quick Reference

Model SizeFP16INT84-bit
7B14 GB7 GB4 GB
13B26 GB13 GB7 GB
34B68 GB34 GB18 GB
70B140 GB70 GB35 GB

Context Length Impact

KV cache grows with context:

ContextAdditional VRAM (7B)
2K+0.5 GB
8K+2 GB
32K+8 GB
128K+32 GB

Consumer GPU VRAM

GPUVRAMMax Model (4-bit)
RTX 306012 GB~20B
RTX 407012 GB~20B
RTX 409024 GB~45B
Apple M2 Pro16 GB~25B
Apple M3 Max64 GB~100B