Understanding RAG Implementation - Simple Mathematical Walkthrough

Understanding RAG with a Simple Example

Most explanations of Retrieval Augmented Generation get too complicated with technical diagrams. Let me show you how it works using basic math and a simple example.

Let’s say we have these documents: “Fry some eggs”, “Scramble eggs”, “Fix a flat tire”

Breaking Down the Text

First we split everything into pieces:

Text0: "Fry some eggs"
Text1: "Scramble eggs" 
Text2: "Fix a flat tire"

Converting to Vectors

Next, we turn each text into numbers using a neural network. Think of it like GPS coordinates but for meaning. Here are some example 4D vectors (real ones are much bigger):

Vec0 = [0.85, 0.15, 0.05, 0.12]   # "Fry some eggs"
Vec1 = [0.82, 0.18, 0.03, 0.11]   # "Scramble eggs"
Vec2 = [-0.25, 0.35, 0.75, 0.15]  # "Fix a flat tire"

Making Them Standard Size

We adjust all vectors to have the same length:

Vec0_norm = [0.979, 0.173, 0.058, 0.138]
Vec1_norm = [0.968, 0.212, 0.035, 0.130]
Vec2_norm = [-0.289, 0.405, 0.867, 0.174]

Storing Everything

We put these vectors in a database and remember which text goes with which number.

Finding Similar Content

When someone asks “What’s the best way to prepare eggs?”, we:

  1. Convert their question to a vector: [0.975, 0.165, 0.02, 0.145]
  2. Compare it to all stored vectors using dot products
  3. Pick the most similar ones

For example:

  • Question × Vec0_norm = 0.987 (very similar)
  • Question × Vec1_norm = 0.982 (very similar)
  • Question × Vec2_norm = -0.156 (not similar)

So we grab the egg cooking instructions and feed them to the AI to generate a good answer.

Great walkthrough of the core mechanics. I built a RAG document search system last year and your example nails exactly how retrieval works. The normalization part is huge - skip it and you’ll get wonky similarity scores that mess up your rankings. One heads up though: real embedding models like sentence-transformers are way more complex than your 4D example. When I was debugging our system, tiny wording changes would shift vectors all over the place. You showed dot product for similarity, which most people use, but cosine similarity usually works better for text since it handles different magnitudes more smoothly. And that retrieval threshold? It’s make-or-break in production. Set it too strict and you miss good context. Too loose and you flood your prompt with garbage.

This math breakdown really shows what’s happening behind the scenes. I’ve worked with RAG systems for two years and people always get confused about why vector similarity works so well. You nailed the key point - semantically similar content clusters together in vector space. Your egg cooking examples both hit positive values in similar dimensions, while tire repair is completely different.

What’s interesting is retrieval quality depends heavily on how you chunk documents upfront. Too small chunks lose context, too large makes vectors generic. In production I’ve seen embedding dimensions from 384 to 1536 - the performance difference is huge with large knowledge bases.

this is way clearer than most rag tutorials i’ve seen. the vector math part finally clicked for me - never realized how the dot product actually measures similarity like that. i’ve been trying to build something similar but kept getting confused by all the fancy terminology. quick question though - do you always need to normalize vectors or does it depend on the embedding model you’re using?

Love how you cut through the academic BS and got straight to the point. I’ve been building RAG systems for years and this is exactly how I explain it to juniors.

Vector database choice will bite you hard if you’re not careful. Started with something simple that handled 10k documents fine, but at 100k+ query times exploded. Had to migrate to proper indexing.

Also - your retrieval is only as good as your preprocessing. Spent weeks debugging irrelevant chunks, turned out garbage data was polluting the vector space. Now I always quality-check embeddings before indexing.

Your dot product example nails it. In practice though, tuning top-k based on query complexity makes a huge difference. Simple questions need fewer chunks, complex ones need you to cast a wider net.