Make money doing the work you believe in

In AI, reliability is more important than accuracy.

But almost no one ever talks about the former. It’s not in company comms, measured in benchmarks, or shown in product demos.

Success for AI models is not about how accurate they are, but actually about how reliable they are in being accurate.

When they fail, how do they fail and why? A single metric is not enough. Not even two metrics (accuracy/cost).

You need more.

If model A fails at task 1 today and succeeds at task 2, and tomorrow it succeeds at task 1 and fails at 2, its accuracy is 50%, but its reliability is 0%.

You cannot deploy this model in any serious setting. Enterprise clients will suffer if this happens.

Kapoor, Rabanser, and Narayanan measured this and, apparently, the models—Opus 4.5, Gemini 3.0, GPT-5.2, not the best but quite good already—are improving on reliability much more slowly than they improve in accuracy.

(h/t Arvind Narayanan and Sayash Kapoor)

My whole point in this long essay that I mention below—where I talk about revenues, CapEx, spending commitments, investments, etc.—is that the circular financing that’s keeping the AI industry afloat is not itself a problem.

I wanted to make that clear. If AI works out, it will be just fine.

The problem is that the figures—$700 billion CapEx, $100-$200 billion in spending commitments, etc—only make sense if the confidence of the AI companies building the models and their CEOs bears fruit.

They depend on enterprise clients and individual users finding reliable gains from using these models. But these kinds of findings suggest that the opposite could very well be true.

If enterprise clients eventually realize they’re not making a return on their investment—because models are good today, but tomorrow, who knows—then it will all come crashing down.

That’s why, despite the high revenues, there’s a pervasive fear that this is actually a bubble. It’s just not one of the financial kind but of the technological kind.

AI is being paid for, but does it really work as it should?

May 7
at
5:45 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.