RAG (Retrieval Augmented Generation)
Technique where an AI chatbot answers from YOUR documents (catalog, FAQ, policies), not just from what the model learned. That makes it accurate and updatable without retraining.
In short: what RAG does
RAG stands for "Retrieval Augmented Generation". A chatbot without RAG answers only from what the model knows (general data up to a cutoff date). A chatbot with RAG receives the question, quickly searches your documents for 3-5 relevant passages, then the model uses those passages to answer accurately with sources.
How it works technically
- Your documents (PDF, sheets, Notion, catalog) are split into 200-500 word passages (chunking).
- Each passage is turned into a numerical vector (embedding) via OpenAI/Voyage.
- Vectors are stored in a vector database: Pinecone, Qdrant, Supabase pgvector.
- When a question comes, it's turned into a vector and the closest 3-5 passages are searched (cosine similarity).
- Found passages are given to the LLM as context, alongside the question.
- The model answers accurately, citing the passages.
Why you need RAG
The typical AI model doesn't know: your prices, return policy, schedule, available medical equipment, customer data, contracts. RAG gives it access to all that without fine-tuning (which is expensive and slow). Updates are instant: change a PDF, add a product, RAG sees it immediately.
Real use cases
- Support chatbot answering from product manual (200 pages)
- Legal assistant citing from 50 contracts
- E-commerce chatbot recommending products from catalog
- Internal assistant answering from company policies (HR, IT, finance)
Typical costs
RAG setup: €800-2,500 (depends on document volume). Cost per conversation: €0.02-0.08. Vector storage: €0-200/month (Supabase pgvector is free for small volumes). Document updates are free after implementation.
Frequently asked questions
What's the difference between RAG and fine-tuning?
+
Which vector database should I use?
+
How many documents can I have in a RAG?
+
Can I use RAG for confidential documents?
+
Related terms
LLM (Large Language Model)
AI model trained on billions of words that understands and generates natural language. 2026 examples: GPT-5, Claude 4.7, Gemini 2.5 Pro.
Vector Database
Database specialized in storing and searching numerical vectors (embeddings) - essential for RAG, AI recommendations, semantic search.
AI Chatbot
Messaging software that converses in text with customers on WhatsApp, Instagram, Messenger or your website, using an AI model trained on your documents.