Gary Mishuris, CFA (@behavioralvalueinvestor): "Just compared my Devil’s Advocate prompt across 5 different LLMs/Agents on a company I know well: Claude 4.7 (regular) ChatGPT 5.5 (regular) Claude 4.7 (Research) ChatGPT 5.5 (Deep Research) Gemini 3.1 (Deep Research) Conclusions: All were at least decent …"

Just compared my Devil’s Advocate prompt across 5 different LLMs/Agents on a company I know well:

Conclusions:

All were at least decent and usable
Gemini was least useful (!), which is surprising because 6 months ago it was at the top
The gap between regular and Research/Deep Research for both Claude and ChatGPT was small to medium - regular versions were pretty good

My rank order for this task was:

Curious what you find if you are comparing workflows across different models/agents? Compound With AI - have you done any comparing recently?

May 8

1:39 PM