Models do not see words. They see tokens, the fragments that pricing, context limits, and speed are all measured in.
Language models do not process whole words or individual letters. They break text into tokens, pieces that are usually part of a word, a short whole word, or a punctuation mark. "Unbelievable" might become "un", "bel", "iev", "able".
This sounds like a technical detail, but it is the unit everything else is measured in. Cost, context limits, and speed are all counted in tokens, not words.
A fixed vocabulary of full words would be huge and would still miss new or rare words. Splitting into common fragments keeps the vocabulary manageable while letting the model build any word, including ones it never saw, from pieces. It is a practical compromise between letters and whole words.
For English, one token averages about four characters, and 100 tokens is roughly 75 words. So a page of text is around 500 tokens, and a long document can be thousands. Code, other languages, and unusual formatting tokenize differently, sometimes far less efficiently.
Every token, in your prompt and in the reply, has to be processed through the whole network, so tokens map directly to compute. That is why providers price per token and count both directions. A long back-and-forth costs more because the whole conversation is re-processed as context each turn.
The context window, the maximum the model can consider at once, is measured in tokens. Generation speed is often quoted in tokens per second. Once you think in tokens, the model's pricing, limits, and pacing all line up.
Think of tokens as the syllables a model reads and speaks in. It does not take in whole words at a glance or spell letter by letter. It works in these in-between chunks, and it counts them constantly.
You never have to count tokens to use Berges AI, but they explain what you feel: why very long pastes can hit a limit, and why a concise prompt gets a faster, cheaper answer. Trimming filler is good for you and the model.
Try Berges AIOn average, one token is about three quarters of a word in English, or roughly four characters. So 1,000 tokens is around 750 words. It varies with the exact text.
Yes. Spaces are usually attached to the following word, and punctuation marks are often their own tokens. Everything in the text contributes to the count.
Tokenizers are usually optimized for English. Other languages and scripts can break into more tokens per word, so the same meaning costs more tokens, and therefore more money and context.