The app for independent voices

Wild: GPT-5.2 just scored 52.9% on ARC-AGI-2.

The average human score is about 60%.

This isn’t a random metric. This test is supposed to be easy for humans and hard for AI. It’s designed to measure abstract reasoning, not pattern-matching.

Not long ago, GPT-4 was scoring between 0% and 10%.

How is that possible?

Will we hit 60% (human-level) by EOY 2025?

Some say the next metric (GDPval) is even more impressive. It measures real-world knowledge work across 44 jobs. GPT-5.2 matches or beats humans about 70.9% of the time.

But I’m a bit skeptical about this one. Unlike GDPval, ARC-AGI-2 was designed to minimize simple pattern-matching.

Tonight is going to be a testing night!

Dec 11
at
10:11 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.