Step through cross attention — watch the decoder query attend to encoder key-value pairs, connecting the encoder's context to each decoder generation step.
Query from Decoder, Keys/Values from Encoder
Cross attention lets the decoder look at the encoder's output at each generation step. Q comes from the decoder; K and V come from the encoder — this is how seq2seq transformers condition generation on input.