What is a context window in AI? Why models seem to forget

A context window is the model's short-term working memory. It is the total amount of text, measured in tokens, that the model can take into account when producing its next response. Everything relevant has to fit inside it.

Crucially, the window holds the whole conversation: your latest message, the earlier back-and-forth, any documents you pasted, and the answer being generated. When it fills up, something has to go.

It is working memory, not storage

The context window is not a database and not permanent memory. It is what the model can see right now. Once the conversation grows beyond the window, the oldest tokens drop off, and the model genuinely cannot see them anymore. That is why a long chat can seem to forget how it started.

Everything shares the same budget

Prompt, pasted files, prior turns, and the response all draw from the same token budget. Paste a very long document and you leave less room for a long answer. Managing what goes into the window is part of getting good results from long tasks.

Bigger windows, and their catch

Modern models have grown from a few thousand tokens to hundreds of thousands or more, enough for whole books. But a bigger window costs more per request and can dilute focus: models sometimes pay less attention to material buried in the middle of a very long context. More room is not automatically better use of it.

Getting around the limit

Because the window is finite, systems that need more knowledge do not just stuff everything in. They fetch only the relevant pieces at the moment they are needed, a technique called retrieval-augmented generation. That keeps the window focused on what matters for the current question.

An analogy

Picture a desk of a fixed size. You can spread out only so many papers before older ones must be cleared to make room. The context window is the desk. What fell off the edge is not lost from the world, but it is out of sight for now.

Questions

Things people ask.

Why does the AI forget what I said earlier?

Because the conversation grew past the context window. The earliest messages dropped out of the model's view. It is not choosing to forget; that text is simply no longer in front of it.

Is a bigger context window always better?

Not always. It lets you include more, but it costs more and models can lose focus on details buried in a very long context. Relevance still beats raw size.

How is context window different from a token limit?

They are the same idea. The context window is expressed as a maximum number of tokens the model can handle at once, so people also call it the token limit.

More concepts Try Berges AI

What is a context window?

It is working memory, not storage

Everything shares the same budget

Bigger windows, and their catch

Getting around the limit

Related concepts

Things people ask.