The app for independent voices

Everyone's talking about how Chinese AI models caught up on quality. DeepSeek, Qwen, and others have genuinely closed the gap. But our partner Weijin Research (please subscribe!) makes a compelling case that the competition has quietly shifted to a completely different battlefield: inference speed.

The numbers are striking. Chinese open-source models run at roughly 100 tokens per second, priced from free to $3 per million tokens. US closed models are doing 400 to 1,000+ tokens per second at $45-150. That's not a small gap. It's a different economic tier entirely. And the reason isn't algorithms, it's hardware. Nvidia's Groq 3 LPU, Cerebras, and Microsoft's Maia 200 are all converging on SRAM-heavy chip architectures purpose-built for fast inference, and China has no domestic equivalent.

This matters because speed isn't just a nice-to-have. It unlocks entirely different categories of applications (real-time coding agents, interactive reasoning) that command premium pricing. Without access to these chips, Chinese providers are structurally locked out of the highest-value segment of the AI market, even if their models are just as smart.

You’ll find the link in comments.

Apr 1
at
5:52 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.