I've been adding agents.md and claude.md files to my repos for the past few weeks.
Turns out, that might be making my coding agents worse.
A new paper — "Evaluating agents.md: Are Repository-Level Context Files Helpful for Coding Agents?" tested this assumption across Claude Code, Codex, and Qwen Code.
The results:
→ LLM-generated context files decreased task success by ~2%
→ Developer-written ones improved it by only ~4%
→ Both increased inference cost by 20%+ and made agents take more steps
Agents followed the instructions perfectly. That was actually the problem.
More instructions → more exploration, more reasoning, more tool calls → but not better outcomes.
The most interesting finding: when they stripped all repo documentation, LLM-generated context files actually helped, and even outperformed developer-written ones.
The lesson? Context files are mostly redundant in well-documented repos. And redundancy creates cognitive load for humans and agents alike.
Stronger models didn't fix it either. Better LLM ≠ better context file.
Practical takeaway if you're maintaining agents.md files:
- Keep them minimal
- Only include essential tooling instructions
- Skip the directory overviews
- Don't restate what's already obvious in the code
Link to the full paper in the comments!