I caught and patched a $20-30 per minute cost-runaway bug in OpenClaw core.
openclaw/openclaw#76345 · closes #76293
The bug, in plain English
When an OpenClaw agent talks to a model like Claude Sonnet and the connection silently stalls, the system is supposed to time out and try again. But the retry would hit the same wedged connection, time out again, retry, time out, forever. Every retry is a paid API call.
One reporter logged 761 to 1,384 paid Claude Sonnet 4.6 calls in 60 seconds across two real incidents. $20-30 burned in a single minute, masked by auto-recharge on the provider account so users only found out at billing.
The fix
A circuit breaker at the outer agent run loop. After 5 stalled attempts in a row with no model output, the system stops trying and refuses further calls until the connection actually responds again. Worst case is now roughly $0.10-$0.30 per incident instead of $20-30. Routing semantics intentionally unchanged so existing deployments keep working.
Files touched
- ·src/agents/pi-embedded-runner/run.ts
- ·src/agents/pi-embedded-runner/run/idle-timeout-breaker.ts
- ·src/agents/pi-embedded-runner/run/idle-timeout-breaker.test.ts
- ·CHANGELOG.md