Q: What are common mistakes when solving KV Cache?

KV cache memory = num_layers × num_heads × seq_len × head_dim × 2 × bytes. Paged attention (vLLM) manages KV cache memory like virtual memory pages. KV cache only works for DECODER attention — encoder KV is reused across all decode steps.

Question 1

What is the algorithm pattern for KV Cache?

Accepted Answer

Memoize Past Key-Value Pairs: During autoregressive generation, each new token must attend to ALL previous tokens. Without a cache, K and V for every past token are recomputed each step. The KV cache stores them once, reducing complexity from O(n²) to O(n) per token.

Question 2

How do you solve KV Cache step by step?

Accepted Answer

For each new token, compute Q, K, V from the token embedding. Without cache: recompute K_i, V_i for all i < t at every step. With cache: store K_t and V_t immediately after computing them. On the next step: retrieve past K, V from cache; only compute new K, V. Memory trade-off: cache grows linearly with sequence length.

Question 3

What are common mistakes when solving KV Cache?

Accepted Answer

KV cache memory = num_layers × num_heads × seq_len × head_dim × 2 × bytes. Paged attention (vLLM) manages KV cache memory like virtual memory pages. KV cache only works for DECODER attention — encoder KV is reused across all decode steps.

KV Cache — Step-by-Step Visualization

Algorithm Pattern

Key Idea

Step-by-Step Approach

Common Gotchas

Related Problems