Gemini 3.0 now has the highest IQ among models, which is impressive. But overall, I'm skeptical about measuring LLM performance on benchmarks designed for humans. Human tests imply all the building blocks that lead to learning the corresponding skills.
LLMs have "jagged intelligence," where they can perform very well on some tasks while performing poorly on others. As a result, they can shortcut their way to the top of human tests while jumping over the building blocks that lead them there.
For example, before learning calculus, humans learn arithmetic and algebra. But you can train an LLM to score fairly on calculus while performing poorly on those underlying skills. It's all about training data distribution.
Nov 24
at
5:40 PM
Relevant people
Log in or sign up
Join the most interesting and insightful discussions.