loading

RAG — Step-by-Step Visualization

mediumAIMLGenerative AINLPLLM

Step through Retrieval-Augmented Generation — watch a query embed, retrieve the most relevant documents by similarity, then augment the LLM prompt for a grounded answer.

Algorithm Pattern

Retrieve Then Generate

Key Idea

RAG reduces LLM hallucination by grounding generation in retrieved facts. The query is embedded, compared to a document vector store, and the top-k matches are injected into the prompt before generation.

Step-by-Step Approach

  1. Embed the query into a dense vector using an embedding model.
  2. Compute cosine similarity between query and all document embeddings.
  3. Retrieve the top-k most similar documents.
  4. Concatenate: prompt = query + retrieved_context.
  5. LLM generates an answer conditioned on the augmented prompt.

Common Gotchas

  • RAG requires an up-to-date vector store — stale docs produce stale answers.
  • Retrieval quality is the bottleneck — better embeddings → better answers.
  • Chunking strategy matters: too small = no context, too large = irrelevant dilution.

Related Problems