RAG
Retrieval-Augmented Generation (RAG) means giving the model external documents as context before asking a question. This lets the model answer using your specific data rather than relying only on its training data.
Use the following company policy document to answer the
question below. Only use information from the document.
---
DOCUMENT: Employee Leave Policy (2024)
Section 3.1: Annual Leave
Full-time employees receive 20 days of paid annual leave
per calendar year. Part-time employees receive a pro-rata
allocation. Unused leave may be carried over up to a
maximum of 5 days into the next year. Leave beyond the
carryover limit is forfeited on December 31.
Section 3.2: Sick Leave
Employees receive 10 days of paid sick leave per year.
A doctor's note is required for absences exceeding 3
consecutive days.
---
Question: How many vacation days can I carry over to
next year?
The model answers using only the provided document, citing the relevant section.
According to Section 3.1 of the Employee Leave Policy,
you can carry over a maximum of 5 days of unused annual
leave into the next year. Any leave beyond that limit
is forfeited on December 31.
A key feature of RAG is that the model can say "I don't know" when the document doesn't contain the answer.
Using only the document above, what is the policy on
parental leave?
The model correctly identifies that this information isn't in the provided context.
The provided document does not contain information
about parental leave. It only covers annual leave
(Section 3.1) and sick leave (Section 3.2). You may
want to check the full employee handbook or contact
HR for parental leave details.
How RAG works in practice:
- User asks a question
- A retrieval system searches your documents for relevant chunks (using embeddings or keyword search)
- The relevant chunks are inserted into the prompt
- The LLM answers based on the retrieved context
This is how most "chat with your docs" products work.
RAG Architecture:
User Question
↓
[Retrieval System] → searches document store
↓
Relevant chunks retrieved
↓
Prompt = chunks + question
↓
[LLM] → generates answer
↓
Answer with citations
Key benefits:
- Up-to-date information (not limited to training data)
- Domain-specific answers
- Auditable sources
- No fine-tuning needed
Key takeaway: RAG bridges the gap between a general LLM and your specific data. It's the most practical way to build AI that knows about your documents, products, or internal knowledge base.
Tips:
- Include "only use the provided context" to reduce
hallucination
- Ask the model to cite which section it used
- Chunk documents into ~500-token pieces for retrieval
- Combine with embeddings for semantic search