Make money doing the work you believe in

// Agentic Harness Engineering //

Pay attention to this one, AI devs.

(bookmark it)

Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution.

This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes.

Each edit becomes a contract you can verify or revert.

Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO.

The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified.

Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise.

Paper: arxiv.org/abs/2604.25850

Learn to build effective AI agents in our academy: academy.dair.ai

Apr 29
at
2:13 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.