The app for independent voices

Of course, "benchmarks != real world performance", and benchmarks have many issues. But what an exciting week for coding LLMs.

We got the open-weight Qwen3-Next-Coder, and we just got the Codex 5.3 / Opus 4.6 double release.

Unfortunately, Anthropic didn’t share SWE Bench Pro benchmarks, but here I put them side by side based on the available Terminus 2.0 numbers:

Feb 6
at
12:05 AM

Log in or sign up

Join the most interesting and insightful discussions.