Make money doing the work you believe in

At any moment, only a small fraction of neurons in the human brain are firing. Most are quiet. This is not a limitation. It is the design. Sparse activation is energetically cheap, computationally efficient, and may be necessary for the kind of compositional reasoning the brain does.

Frontier language models are dense. Every parameter is active for every forward pass. A 100-billion-parameter model uses 100 billion parameters whether the input is a haiku or a quantum mechanics problem. The energy cost is real, the inference cost is real, and the architectural implication is that current systems may be solving every problem with maximum effort because they cannot do otherwise.

Mixture-of-experts architectures are an attempt to introduce sparsity. They route inputs to subsets of experts, activating perhaps a tenth of total parameters per pass. This is closer to the brain. It is also harder to train, harder to debug, and produces less stable scaling curves. The frontier labs are betting that dense scaling will keep working long enough to reach goals before sparsity becomes mandatory.

The brain's strategy suggests this bet is wrong on a long-enough timeline. Intelligence at biological scale runs on roughly 20 watts. Frontier model training runs on tens of megawatts. If the brain found a sparse architecture and AI is not finding one, the gap is six orders of magnitude. It will close, eventually, by AI moving toward sparsity. The labs that figure out how to train sparse systems efficiently will own the next era.

May 9
at
2:42 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.