loading

Positional Encoding — Step-by-Step Visualization

easyAIMLTransformerNLP

Step through transformer positional encoding — watch sinusoidal waves assigned to each position, letting the model know token order without recurrence.

Algorithm Pattern

Sinusoidal Position Signals

Key Idea

Transformers process tokens in parallel with no inherent order. PE injects position via sine/cosine waves of different frequencies — nearby positions have similar encodings.

Step-by-Step Approach

  1. PE(p, 2i) = sin(p / 10000^(2i / d_model)) for even dimensions.
  2. PE(p, 2i+1) = cos(p / 10000^(2i / d_model)) for odd dimensions.
  3. Low dimensions oscillate fast (position-sensitive); high dimensions change slowly.
  4. PE is added to the token embedding before the attention layers.
  5. The dot product PE(p)·PE(p+k) depends only on k — encodes relative distance.

Common Gotchas

  • PE is fixed (not learned) in the original paper; learned PE works similarly.
  • d_model is typically 512 in the original transformer.
  • Rotary PE (RoPE) and ALiBi are modern alternatives used in LLaMA and GPT-NeoX.

Related Problems