Glossary Entry

Token

The basic unit a language model reads and predicts, which may be a word, character, or subword fragment.

LLMs Language

Also called: tokens

Seed source: Google ML Glossary

Tokenization breaks raw text into units the model can actually process. Those units are often smaller than words, which is why token counts and word counts do not line up neatly.

This matters across the chatbot, LLM, and fine-tuning posts because context limits, cost, latency, and next-token prediction are all defined in terms of tokens rather than sentences.