loading

LSTM — Step-by-Step Visualization

mediumAIMLDeep LearningSequence

Step through an LSTM cell — watch the forget gate clear old memory, the input gate write new values, and the output gate produce the hidden state.

Algorithm Pattern

Gated Memory Cell

Key Idea

LSTM solves vanishing gradients via multiplicative gates (forget, input, output) controlling information flow through a persistent cell state.

Step-by-Step Approach

  1. Forget gate: f = σ(Wf·[h,x] + bf) — how much of c_prev to keep.
  2. Input gate: i = σ(Wi·[h,x] + bi) — how much of the candidate to write.
  3. Cell candidate: g = tanh(Wg·[h,x] + bg) — new candidate cell value.
  4. New cell state: c = f⊙c_prev + i⊙g — blend old memory with new.
  5. Output gate: o = σ(Wo·[h,x] + bo); h = o⊙tanh(c).

Common Gotchas

  • Forget gate output near 1 means KEEP (not forget) — the name is counterintuitive.
  • Cell state c flows with only element-wise ops — the gradient highway.
  • LSTMs dominated sequence tasks until Transformers; still used for streaming inference.

Related Problems