Sebastian Raschka, PhD (@rasbt): "Of course, "benchmarks != real world performance", and benchmarks have many issues. But what an exciting week for coding LLMs. We got the open-weight Qwen3-Next-Coder, and we just got the Codex 5.3 / Opus 4.6 double release. Unfortunately, Anthropic didn’t share SWE Bench Pr…"

Of course, "benchmarks != real world performance", and benchmarks have many issues. But what an exciting week for coding LLMs.

We got the open-weight Qwen3-Next-Coder, and we just got the Codex 5.3 / Opus 4.6 double release.

Unfortunately, Anthropic didn’t share SWE Bench Pro benchmarks, but here I put them side by side based on the available Terminus 2.0 numbers:

Feb 6

12:05 AM