Step through scaled dot-product self-attention — see how Q, K, V matrices are computed and how each token attends to every other token via the attention heatmap.