What is RAG (retrieval-augmented generation)?

A base model answers from what it memorized during training, which is frozen and cannot include your private files or today's news. Retrieval-augmented generation, or RAG, fixes that by looking things up before answering.

The idea is simple: when a question comes in, first search a collection of documents for the relevant pieces, then hand those pieces to the model along with the question, and ask it to answer from them.

Retrieve, then generate

RAG has two steps. Retrieval searches your document collection for the passages most relevant to the question. Generation then feeds those passages to the model as context, so the answer is built from real, specific text rather than from hazy memory. The model still writes the answer; it just has the right material in front of it.

How the search finds the right text

Documents are usually turned into embeddings, numerical representations of meaning, and stored in a vector database. A question is embedded the same way, and the system pulls the passages whose meaning is closest. This finds relevant content even when the wording differs from the question.

Why it cuts hallucination

When the model is handed the actual source text and told to answer from it, it has far less need to invent. You can also show which passages the answer came from, so claims are checkable. It does not eliminate errors, but grounding answers in retrieved evidence makes them far more trustworthy.

RAG vs fine-tuning

Both add specialization, but they solve different problems. Fine-tuning changes how the model behaves. RAG changes what the model knows at answer time, and its knowledge updates the instant you update the documents. For "answer from this information", RAG is usually the right tool.

An analogy

A base model is a student answering a closed-book exam from memory. RAG turns it into an open-book exam: before answering, the student is handed exactly the right pages, so the reply is grounded in the source instead of half-remembered.

Questions

Things people ask.

What problem does RAG solve?

It lets a model answer from information it was never trained on, such as your private documents or recent events, and it grounds answers in real sources so they can be cited and checked.

Is RAG better than fine-tuning?

For adding or updating knowledge, usually yes, because you just change the documents. Fine-tuning is better for changing behavior and style. Many systems use both.

Does RAG stop hallucination completely?

No, but it reduces it a lot. If retrieval surfaces the wrong passages, or the model strays from them, errors can still slip through. Showing sources lets you catch those.

More concepts Try Berges AI

What is retrieval-augmented generation?

Retrieve, then generate

How the search finds the right text

Why it cuts hallucination

RAG vs fine-tuning

Related concepts

Things people ask.