Question 1

What is the algorithm pattern for Positional Encoding?

Accepted Answer

Sinusoidal Position Signals: Transformers process tokens in parallel with no inherent order. PE injects position via sine/cosine waves of different frequencies — nearby positions have similar encodings.

Question 2

How do you solve Positional Encoding step by step?

Accepted Answer

PE(p, 2i) = sin(p / 10000^(2i / d_model)) for even dimensions. PE(p, 2i+1) = cos(p / 10000^(2i / d_model)) for odd dimensions. Low dimensions oscillate fast (position-sensitive); high dimensions change slowly. PE is added to the token embedding before the attention layers. The dot product PE(p)·PE(p+k) depends only on k — encodes relative distance.

Question 3

What are common mistakes when solving Positional Encoding?

Accepted Answer

PE is fixed (not learned) in the original paper; learned PE works similarly. d_model is typically 512 in the original transformer. Rotary PE (RoPE) and ALiBi are modern alternatives used in LLaMA and GPT-NeoX.

Positional Encoding — Step-by-Step Visualization

Algorithm Pattern

Key Idea

Step-by-Step Approach

Common Gotchas

Related Problems