loading

Softmax — Step-by-Step Visualization

easyAIMLClassificationNeural Network

Step through the softmax function — watch raw logits exponentiated and normalized into a valid probability distribution.

Algorithm Pattern

Exponential Normalization

Key Idea

Softmax converts any real vector into a probability distribution. Exponentiation amplifies differences: the largest logit gets a disproportionately large share — softmax is 'winner-take-most'.

Step-by-Step Approach

  1. For each logit z_i: compute exp(z_i).
  2. Sum all exponentials: S = Σ exp(z_i).
  3. Probability: p_i = exp(z_i) / S.
  4. Result is always positive and sums to exactly 1.
  5. Subtract max(z) first for numerical stability — doesn't change the output.

Common Gotchas

  • Softmax is only used at the OUTPUT layer — never in hidden layers (use ReLU there).
  • The gradient of softmax cross-entropy is elegantly simple: ŷ − y.
  • Temperature scaling: softmax(z/T) — T<1 sharpens, T>1 flattens the distribution.

Related Problems