another day another benchmark drop.
Gemini 3.1 is here.
stats looks pretty good honestly.
look at that ARC-AGI-2 jump!
didn’t beat Opus 4.6 on Humanity’s Last Exam, but still a solid score.
BrowseComp also through the roof, so it should have a really good agentic search function.