Paniolo is grounded in Agentic Harness Engineering (AHE), published at ICLR 2026 by researchers from Fudan University, Peking University, and Shanghai Qiji Zhifeng. The paper introduces a closed-loop observability framework that autonomously evolves coding-agent harnesses without base-model retraining.
Ten iterations lift pass@1 from 69.7% to 77.0%, surpassing every human-designed baseline — OpenCode, Terminus-2, and Codex — and both self-evolving baselines. The frozen harness transfers to SWE-bench-verified and yields consistent gains of +5.1 to +10.1 pp across three alternate model families.
The evolved harness uses 12% fewer tokens than the seed. As token pricing increases, this efficiency advantage compounds. Better performance and lower cost are not in tension — harness quality is the resolution.
Paniolo was already building toward this. We treat every agent error as infrastructure debt, every correction as a permanent improvement to the intelligence layer. The science validated the architecture we were already constructing.
Terminal-Bench 2
SWE-bench-verified
ICLR 2026
Cross-model Transfer
Tools · Middleware · Memory