loading

Word2Vec — Step-by-Step Visualization

mediumAIMLNLPEmbeddings

Step through Word2Vec skip-gram — watch a center word predict context words through embedding lookup, dot products, and softmax.

Algorithm Pattern

Shared Embedding Matrix

Key Idea

Word2Vec trains a shallow net to predict context from a center word. The weight matrix rows become the embeddings — similar words end up geometrically close.

Step-by-Step Approach

  1. One-hot encode the center word.
  2. Embedding lookup: multiply one-hot by W to get the center word vector.
  3. Score each vocab word: dot(center_emb, context_emb_i).
  4. Softmax over scores → predicted context probabilities.
  5. Gradient descent increases P(actual context words).

Common Gotchas

  • The embedding IS the weight row — not a separate computation.
  • Negative sampling replaces full softmax for large vocabularies.
  • king − man + woman ≈ queen emerges purely from co-occurrence statistics.

Related Problems