Each advance in AI’s autonomous execution changes what’s possible in the real world.
Nicholas Carlini's agentic experiment – 16 Claude agents, $20K in API costs, 100,000 lines of Rust, a working C compiler that builds the Linux kernel across three architectures – is a genuinely hard engineering problem and the kind that moves prediction markets.
Carlini had tried the same experiment with earlier Opus models. Opus 4.5, released just months ago, could pass test suites but choked on real projects.
Anthropic's odds for best AI model by end of February jumped from 40% to 75% on Polymarket in the days after Opus 4.6 launched.