GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead.
That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.
The footnote: “*Anthropic reported signs of memorization on a subset of problems“