loading

Cross Attention — Step-by-Step Visualization

hardAIMLTransformerAttentionGenerative AI

Step through cross attention — watch the decoder query attend to encoder key-value pairs, connecting the encoder's context to each decoder generation step.

Algorithm Pattern

Query from Decoder, Keys/Values from Encoder

Key Idea

Cross attention lets the decoder look at the encoder's output at each generation step. Q comes from the decoder; K and V come from the encoder — this is how seq2seq transformers condition generation on input.

Step-by-Step Approach

  1. Q = decoder_state × W_Q (what am I looking for?).
  2. K = encoder_output × W_K (what does each encoder token contain?).
  3. V = encoder_output × W_V (what to retrieve if attention is high?).
  4. Scores = Q·K^T / √d_k — scaled dot product.
  5. Output = Softmax(scores) · V — weighted sum of encoder values.

Common Gotchas

  • In cross attention K and V come from the encoder; in self-attention all three come from the same source.
  • At inference, K and V for the encoder are computed once and cached for efficiency.
  • Cross attention connects encoder and decoder in T5, BART, and seq2seq Transformers.

Related Problems