loading

Backpropagation — Step-by-Step Visualization

hardAIDLNeural NetworkGradient

Step through the forward pass and backward pass of a 2-layer neural network — watch how gradients flow from loss back through each weight using the chain rule.

Algorithm Pattern

Chain Rule of Calculus

Key Idea

Backprop computes how much each weight contributed to the loss by applying the chain rule backwards through the computation graph.

Step-by-Step Approach

  1. Run the forward pass: compute z1, a1 (hidden), z2, a2 (output).
  2. Compute the loss (MSE: 0.5 * (a2 - y)²).
  3. Compute dL/da2 = a2 - y (output error).
  4. Multiply by the sigmoid derivative da2/dz2 = a2*(1-a2) to get dL/dz2.
  5. Compute dW2 = dL/dz2 * a1 (gradient for output weights).
  6. Propagate error back: dL/da1 = W2.T @ dL/dz2.
  7. Multiply by sigmoid derivative to get dL/dz1.
  8. Compute dW1 = dL/dz1 * x.T (gradient for hidden weights).

Common Gotchas

  • The sigmoid derivative σ'(z) = σ(z)*(1-σ(z)) — reuse the already-computed activations.
  • Gradient dimensions must match weight matrix dimensions exactly.
  • Vanishing gradients: sigmoid saturates near 0 and 1, making gradients tiny in deep nets.

Related Problems