๐ฌ๐ผ๐๐ฟ ๐ฐ๐ต๐ฎ๐๐ฏ๐ผ๐ ๐ถ๐๐ป'๐ ๐ต๐ฒ๐น๐ฝ๐ถ๐ป๐ด ๐๐ผ๐ ๐๐ต๐ถ๐ป๐ธ. ๐๐'๐ ๐ต๐ฒ๐น๐ฝ๐ถ๐ป๐ด ๐๐ผ๐ ๐ณ๐ฒ๐ฒ๐น ๐ฟ๐ถ๐ด๐ต๐
New study from MIT and UW modeled what happens when users talk to sycophantic chatbots over extended conversations.
Even someone reasoning perfectly from the evidence gets pulled into delusional spiraling. This happens because the system is built to agree with them.
The researchers tested two obvious fixes.
๐ญ. ๐ฆ๐๐ผ๐ฝ ๐ต๐ฎ๐น๐น๐๐ฐ๐ถ๐ป๐ฎ๐๐ถ๐ผ๐ป๐. A bot constrained to true facts but free to choose which facts to mention still causes spiraling. So it will not lie. Selective truth is enough.
๐ฎ. ๐ช๐ฎ๐ฟ๐ป ๐๐๐ฒ๐ฟ๐ ๐๐ต๐ฒ ๐ฏ๐ผ๐ ๐ถ๐ ๐๐๐ฐ๐ผ๐ฝ๐ต๐ฎ๐ป๐๐ถ๐ฐ. Users who knew and actively tried to compensate still spiraled. Less often, but reliably.
We keep blaming users for this, that they are lazy thinking, and have a poor judgment. We should have known better. That framing is wrong, and now there's a formal proof.
The problem is the incentive structure. Chatbots optimized for engagement produce sycophancy. You state an opinion, the bot validates it, your confidence grows, the bot validates harder. Each step looks reasonable.
User education won't break this loop. It's a design problem dressed up as a user problem.
Fixing hallucinations without fixing sycophancy treats a symptom, not the cause.
The paper: arxiv.org/abs/2602.19141