I’m currently nerd sniped on how LLM training is shifting very rapidly to agents, which is building new pathways for training and changing how we measure performance. The question is — how does this show up in open model capabilities?
To date, there haven’t been any meaningful changes in the gap between the best open and closed models on benchmark scores. Roughly constant on ~6month gap.
The key is that open model builders have obviously demonstrated they can keep up with key benchmarks across various narrow domains.
The only ways that open model builders fall behind is
they run out of money, or
the training data needed to keep pushing the models becomes closed off
To date, distillation & buying environments have been major levers for open models keeping up. At the same time, missing revenue for open model labs is a ticking time bomb, but one that still won’t go off for multiple years.
There are trends emerging where very-heavy agent adoption could buck this stable equilibrium of fast-following that we’ve seen for years.