Make money doing the work you believe in

GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead.

That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.

The footnote: “*Anthropic reported signs of memorization on a subset of problems“

Apr 23
at
9:29 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.