They also found that “reasoning models”, like all of OpenAI’s latest models, which use hidden chain prompts to “appear” to think, make them worse. Loosely, these models turn a single query into many, and that just creates more opportunities for a hallucination to mess things up.