LLMs have always predicted one token at a time. Tencent just proposed a way to stop doing that.
CALM replaces tokens with continuous vectors. Fewer steps. More information per step.
This might change how language models scale.