Last week The Information reported that xAI’s Colossus-1 in Memphis, Tennessee, achieves a mere 11% MFU (Model Flop Utilization), compared to 45-55% other hyperscalers achieve.
My latest article is a comprehensive analysis of:
What really is MFU and why is it hard to achieve, and where 50% of the FLOPs are lost — Communication bottlenecks, silent data corruption (SDC), and stragglers
Lessons from Google, Meta, ByteDance, and DeepSeek
Who does it best, and
The uncomfortable truth behind deploying the latest NVIDIA GPUs.