VRAM Requirements

FP16: 2 bytes/param
INT8: 1 byte/param
4-bit: 0.5 bytes/param
1.2x: Overhead for KV cache, activations

Understanding VRAM needs is essential for running LLMs locally.

Estimation Formula

VRAM (GB) ≈ Parameters (B) × Bytes per Parameter × 1.2

Where:

Model Size	FP16	INT8	4-bit
7B	14 GB	7 GB	4 GB
13B	26 GB	13 GB	7 GB
34B	68 GB	34 GB	18 GB
70B	140 GB	70 GB	35 GB

KV cache grows with context: