Make money doing the work you believe in

Just compared my Devil’s Advocate prompt across 5 different LLMs/Agents on a company I know well:

  1. Claude 4.7 (regular)

  2. ChatGPT 5.5 (regular)

  3. Claude 4.7 (Research)

  4. ChatGPT 5.5 (Deep Research)

  5. Gemini 3.1 (Deep Research)

Conclusions:

  • All were at least decent and usable

  • Gemini was least useful (!), which is surprising because 6 months ago it was at the top

  • The gap between regular and Research/Deep Research for both Claude and ChatGPT was small to medium - regular versions were pretty good

My rank order for this task was:

  1. Claude 4.7 Research: 9/10

  2. Claude 4.7: 8.5/10

  3. ChatGPT 5.5 Deep Research: 8.5/10

  4. ChatGPT 5.5: 8/10

  5. Gemini 3.1 Deep Research: 7.5/10

Curious what you find if you are comparing workflows across different models/agents? Compound With AI - have you done any comparing recently?

May 8
at
1:39 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.