Ramona C. Truta (@ramonactruta)

The app for independent voices

API shares its etymological root with the Latin word for bees (apis). It's a fitting coincidence, considering my feed is currently flooded with developers managing swarms of autonomous agents.

Today is Pi Day. 3.14 - the number most of us have internalized as 𝛑. It is an irrational number, with infinite non-repeating digits following the decimal point.

Does it matter how many decimals we use, and how we round up? When dealing with LLM inference, it is the entire battleground. Modern inference quantization formats are essentially extreme rounding exercises, chopping high-precision weights down to 4-bit representations to fit consumer VRAM. If you truncate poorly, the model loses its context entirely.

But the rounding problem goes down to the bare metal. Mira Murati's new startup, Thinking Machines Lab, recently proved that microscopic rounding differences from batch-size-dependent operation ordering are exactly why LLMs aren't deterministic.

When GPUs execute matrix multiplication, they change the order of operations based on batch size to optimize speed. In floating-point math, (a+b)+c does not perfectly equal a+(b+c). The order alters the rounding.

Those tiny decimal variations compound across billions of parameters until the model spits out a completely different output, even at temperature zero. Forcing immutable, batch-invariant calculations fixes this, but it incurs a massive 60% performance penalty.

Everything in this space compounds. The decimals compound. The precision loss compounds. And most importantly, the costs compound.

We are currently drowning in something Claude defined 👇 (there's a whole other story there...).

On this 𝛑 Day, I have this wish: if you are sharing your latest framework built with Claude Code or similar tools, please also share the cost of those experiments.

Share your token cost. No rounding.

Happy 𝛑 Day! 🤓

Mar 14

11:36 PM

The app for independent voices

Log in or sign up