The article argues that enthusiasm for ever-larger, “God-like” large language models (LLMs) is cooling as progress at the frontier slows and practical shortcomings (like hallucinations and high costs) persist. In their place, smaller, specialized language models (SLMs) are rapidly improving—often taught by bigger models—and proving sufficient for many enterprise tasks. These models are cheaper to run, can execute reliably on existing on-prem hardware or mobile devices, and are especially well-suited for AI agents and on-device use where speed and efficiency matter. Industry voices and benchmarks suggest today’s small models rival last year’s larger ones, shifting demand toward fine-tuned, task-specific systems rather than monolithic general intelligences.
This shift could vindicate laggards like Apple, whose hybrid approach runs some AI on-device with SLMs and offloads harder tasks to the cloud. While cloud giants continue investing massively in data centers for LLMs, heterogeneity is rising: even OpenAI’s GPT-5 reportedly mixes internal models of different sizes to match task complexity. If SLMs keep improving, they may undercut assumptions that LLMs must power most agentic AI, enabling modular “Lego-like” agents and better economics. In the long run, Apple’s slower, on-device-first strategy may prove prescient despite early stumbles with Apple Intelligence.