Make money doing the work you believe in

The word "token" entered AI from compiler theory. A token in a compiler is a discrete syntactic unit produced by a lexer: a keyword, an identifier, an operator. The word implies discrete, well-defined, atomic.

Modern AI inherited the word and the implication. Tokens are now the unit of training, the unit of inference, the unit of pricing. Models are described as having context windows of N tokens. Inference is described as costing X cents per million tokens. The vocabulary makes tokens sound like the natural atoms of language processing.

They are not. A "token" in modern LLMs is a byte-pair encoding artifact: a string of characters that appears frequently enough in training data to warrant its own embedding. The procedure that produces tokens is statistical, not linguistic. The tokens themselves are not units of language. They are units of compression, optimized for representation efficiency, not for any property humans associate with words or morphemes.

This is why models do strange things at token boundaries. They struggle to count letters in a word, because the word may be one token. They struggle with character-level manipulation because they have never seen characters as units. They tokenize numbers into chunks that have no mathematical meaning.

The right vocabulary would call these "compression units" or "encoding fragments." The wrong vocabulary, which is the one we use, calls them tokens, importing intuitions from compiler theory that do not apply. The intuitions then leak into product decisions, pricing models, and capability claims.

The map is not the territory, but in AI, the map keeps writing the territory by changing how we think about what we have built.

May 4
at
4:44 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.