The model's working memory: how much text it can hold in mind at once, and why it seems to forget once you go past it.
A context window is the model's short-term working memory. It is the total amount of text, measured in tokens, that the model can take into account when producing its next response. Everything relevant has to fit inside it.
Crucially, the window holds the whole conversation: your latest message, the earlier back-and-forth, any documents you pasted, and the answer being generated. When it fills up, something has to go.
The context window is not a database and not permanent memory. It is what the model can see right now. Once the conversation grows beyond the window, the oldest tokens drop off, and the model genuinely cannot see them anymore. That is why a long chat can seem to forget how it started.
Prompt, pasted files, prior turns, and the response all draw from the same token budget. Paste a very long document and you leave less room for a long answer. Managing what goes into the window is part of getting good results from long tasks.
Modern models have grown from a few thousand tokens to hundreds of thousands or more, enough for whole books. But a bigger window costs more per request and can dilute focus: models sometimes pay less attention to material buried in the middle of a very long context. More room is not automatically better use of it.
Because the window is finite, systems that need more knowledge do not just stuff everything in. They fetch only the relevant pieces at the moment they are needed, a technique called retrieval-augmented generation. That keeps the window focused on what matters for the current question.
Picture a desk of a fixed size. You can spread out only so many papers before older ones must be cleared to make room. The context window is the desk. What fell off the edge is not lost from the world, but it is out of sight for now.
Context limits are why a very long conversation with any assistant, including Berges AI, can lose track of early details. Starting a fresh chat for a new topic, or restating the key facts, keeps the model working from what actually matters.
Try Berges AIBecause the conversation grew past the context window. The earliest messages dropped out of the model's view. It is not choosing to forget; that text is simply no longer in front of it.
Not always. It lets you include more, but it costs more and models can lose focus on details buried in a very long context. Relevance still beats raw size.
They are the same idea. The context window is expressed as a maximum number of tokens the model can handle at once, so people also call it the token limit.