Back to glossaryAI Foundations

RAG (Retrieval Augmented Generation)

Technique where an AI chatbot answers from YOUR documents (catalog, FAQ, policies), not just from what the model learned. That makes it accurate and updatable without retraining.

In short: what RAG does

RAG stands for "Retrieval Augmented Generation". A chatbot without RAG answers only from what the model knows (general data up to a cutoff date). A chatbot with RAG receives the question, quickly searches your documents for 3-5 relevant passages, then the model uses those passages to answer accurately with sources.

How it works technically

  1. Your documents (PDF, sheets, Notion, catalog) are split into 200-500 word passages (chunking).
  2. Each passage is turned into a numerical vector (embedding) via OpenAI/Voyage.
  3. Vectors are stored in a vector database: Pinecone, Qdrant, Supabase pgvector.
  4. When a question comes, it's turned into a vector and the closest 3-5 passages are searched (cosine similarity).
  5. Found passages are given to the LLM as context, alongside the question.
  6. The model answers accurately, citing the passages.

Why you need RAG

The typical AI model doesn't know: your prices, return policy, schedule, available medical equipment, customer data, contracts. RAG gives it access to all that without fine-tuning (which is expensive and slow). Updates are instant: change a PDF, add a product, RAG sees it immediately.

Real use cases

  • Support chatbot answering from product manual (200 pages)
  • Legal assistant citing from 50 contracts
  • E-commerce chatbot recommending products from catalog
  • Internal assistant answering from company policies (HR, IT, finance)

Typical costs

RAG setup: €800-2,500 (depends on document volume). Cost per conversation: €0.02-0.08. Vector storage: €0-200/month (Supabase pgvector is free for small volumes). Document updates are free after implementation.

Frequently asked questions

What's the difference between RAG and fine-tuning?

+
Fine-tuning trains the model on your data (expensive, slow, can't update without retraining). RAG gives the model access to live documents (fast, cheap, instant updates). For 95% of cases RAG is enough.

Which vector database should I use?

+
Pinecone (cloud, pricey, easy) for serious production. Qdrant (self-hosted, free) for control. Supabase pgvector (built into Postgres) for most SMB cases.

How many documents can I have in a RAG?

+
Practically unlimited. Real cases: 10,000 - 1,000,000 passages per index. Cost scales linearly with volume.

Can I use RAG for confidential documents?

+
Yes, but be careful with hosting and GDPR. For sensitive data (medical, legal) we recommend EU-hosted Supabase or self-hosted Qdrant.

Related terms

Want to implement this in your business?

Book a free consultation