Step through batch normalization — watch a batch of activations centered, scaled, then shifted by learned gamma and beta parameters.
Per-Batch Statistics Normalization
BatchNorm normalizes activations to zero mean and unit variance across the batch, then applies learned scale (γ) and shift (β). This reduces internal covariate shift and allows higher learning rates.