Concept

What are tokens?

Models do not see words. They see tokens, the fragments that pricing, context limits, and speed are all measured in.

The short version

Language models do not process whole words or individual letters. They break text into tokens, pieces that are usually part of a word, a short whole word, or a punctuation mark. "Unbelievable" might become "un", "bel", "iev", "able".

This sounds like a technical detail, but it is the unit everything else is measured in. Cost, context limits, and speed are all counted in tokens, not words.

Why split into tokens at all

A fixed vocabulary of full words would be huge and would still miss new or rare words. Splitting into common fragments keeps the vocabulary manageable while letting the model build any word, including ones it never saw, from pieces. It is a practical compromise between letters and whole words.

The rough conversions

For English, one token averages about four characters, and 100 tokens is roughly 75 words. So a page of text is around 500 tokens, and a long document can be thousands. Code, other languages, and unusual formatting tokenize differently, sometimes far less efficiently.

Why you are billed in tokens

Every token, in your prompt and in the reply, has to be processed through the whole network, so tokens map directly to compute. That is why providers price per token and count both directions. A long back-and-forth costs more because the whole conversation is re-processed as context each turn.

Tokens set the limits

The context window, the maximum the model can consider at once, is measured in tokens. Generation speed is often quoted in tokens per second. Once you think in tokens, the model's pricing, limits, and pacing all line up.

An analogy

Think of tokens as the syllables a model reads and speaks in. It does not take in whole words at a glance or spell letter by letter. It works in these in-between chunks, and it counts them constantly.

Where Berges AI fits

You never have to count tokens to use Berges AI, but they explain what you feel: why very long pastes can hit a limit, and why a concise prompt gets a faster, cheaper answer. Trimming filler is good for you and the model.

Try Berges AI
Keep going

Related concepts

Questions

Things people ask.

How many words is a token?

On average, one token is about three quarters of a word in English, or roughly four characters. So 1,000 tokens is around 750 words. It varies with the exact text.

Do spaces and punctuation count as tokens?

Yes. Spaces are usually attached to the following word, and punctuation marks are often their own tokens. Everything in the text contributes to the count.

Why does the same text cost more in another language?

Tokenizers are usually optimized for English. Other languages and scripts can break into more tokens per word, so the same meaning costs more tokens, and therefore more money and context.