OpenAI CTO Murati mentioned in an interview that GPT-3's intelligence is comparable to that of a child, GPT-4 to that of a smart high school student, and the next generation model (GPT-5), set to be released in 18 months, will reach a PhD level.
Claude 3.5 Sonnet has already pushed the countdown to AGI to 75%, becoming the first model to achieve a test score higher than the smartest human PhD.
In graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding ability (HumanEval), Claude 3.5 Sonnet unexpectedly set new state-of-the-art records.
It scored 90.4 in MMLU and 67.2 in GPQA.
This is also the first time an LLM has surpassed the 65% threshold in GPQA, reaching the level of the smartest human PhD.
It is worth noting that an average PhD scores 34% in GPQA, a specialized PhD scores 65%, and Claude 3.5 Sonnet has clearly exceeded them.