Question 1

What is the algorithm pattern for Batch Normalization?

Accepted Answer

Per-Batch Statistics Normalization: BatchNorm normalizes activations to zero mean and unit variance across the batch, then applies learned scale (γ) and shift (β). This reduces internal covariate shift and allows higher learning rates.

Question 2

How do you solve Batch Normalization step by step?

Accepted Answer

Compute batch mean: μ = mean(x). Compute batch variance: σ² = mean((x − μ)²). Normalize: x̂ = (x − μ) / √(σ² + ε). Scale and shift: y = γ·x̂ + β  (learned parameters). At inference, use running mean/variance instead of batch statistics.

Question 3

What are common mistakes when solving Batch Normalization?

Accepted Answer

BatchNorm behaves differently during training vs inference — a common source of bugs. Small batch sizes make batch statistics noisy — use GroupNorm or LayerNorm instead. BatchNorm inserts a dependency between samples in the same batch.

Batch Normalization — Step-by-Step Visualization

Algorithm Pattern

Key Idea

Step-by-Step Approach

Common Gotchas

Related Problems