loading

Batch Normalization — Step-by-Step Visualization

easyAIMLNormalizationDeep Learning

Step through batch normalization — watch a batch of activations centered, scaled, then shifted by learned gamma and beta parameters.

Algorithm Pattern

Per-Batch Statistics Normalization

Key Idea

BatchNorm normalizes activations to zero mean and unit variance across the batch, then applies learned scale (γ) and shift (β). This reduces internal covariate shift and allows higher learning rates.

Step-by-Step Approach

  1. Compute batch mean: μ = mean(x).
  2. Compute batch variance: σ² = mean((x − μ)²).
  3. Normalize: x̂ = (x − μ) / √(σ² + ε).
  4. Scale and shift: y = γ·x̂ + β (learned parameters).
  5. At inference, use running mean/variance instead of batch statistics.

Common Gotchas

  • BatchNorm behaves differently during training vs inference — a common source of bugs.
  • Small batch sizes make batch statistics noisy — use GroupNorm or LayerNorm instead.
  • BatchNorm inserts a dependency between samples in the same batch.

Related Problems