Cost Optimization, Day 4 of the Free Comprehensive OpenClaw Course
Cut Your Bill by 70% Without Losing Quality
Why this matters
OpenClaw cost optimization is the difference between a $17 a month agent and a $70 a month agent doing the same work. The biggest cost lever is heartbeat math, the second is two-tier model routing, the third is prompt bloat in the .md files. This lesson walks every lever in the order they actually move the bill, with the exact dial settings I run in production deployments.
How do I reduce my OpenClaw bill?
OpenClaw cost optimization comes down to four levers, in the order they actually move the bill: heartbeat tuning, two-tier model routing, prompt caching, and prompt bloat in the .md files. Get all four right and a $70-a-month agent becomes a $17-a-month agent doing the same work. I have done this on real client deployments more than once.
The heartbeat lever is the biggest. Every heartbeat tick is a paid API call. If your openclaw heartbeat.md fires every minute, that is 1,440 paid calls a day per agent. If it fires every 15 minutes, that is 96 calls a day. Same agent, same work, 15x cheaper. Most of the things people put in HEARTBEAT.md are perfectly happy with a 15 or 30 minute interval. Start there, tighten only if you can name a specific reason.
The second lever is provider routing. Cloud frontier models are usually 5x to 15x more expensive than the cheaper tier from the same provider for the same prompt. Most prompts in a personal agent do not need the frontier model.
The heartbeat math, in real numbers
The arithmetic is simple, the result is shocking. A heartbeat tick on Claude Sonnet with the agent's full workspace context is roughly $0.02 per tick. Multiply by tick frequency. 1 minute heartbeat: 1,440 ticks per day, 43,200 per month, $864 a month. 5 minute heartbeat: 288 per day, 8,640 per month, $173 a month. 15 minute heartbeat: 96 per day, 2,880 per month, $58 a month. 30 minute heartbeat: 48 per day, 1,440 per month, $29 a month. 1 hour heartbeat: 24 per day, 720 per month, $14 a month.
The same agent, same memory, same instructions, same SOUL.md. The only thing changing is how often the runtime fires the loop. The reason this matters so much is that 90 percent of ticks have nothing to do, the agent just looks at HEARTBEAT.md, decides nothing applies, and goes back to sleep. That decision still costs $0.02. Tighter intervals just buy more "nothing to do" decisions.
Move the heartbeat decision to a cheap-tier model and the per-tick cost drops to roughly $0.002. Same intervals: 1 minute is now $86 a month, 15 minute is $5.80, 1 hour is $1.40. This is the single highest-leverage edit in the whole runtime, and it is one line in AGENTS.md.
What is two-tier model processing and why does it work?
Two-tier means you wire the agent to two models from the same provider, a cheap fast tier and a smart slow tier, and the runtime decides per prompt which to use. The decision is based on prompt complexity, tool count, expected response length, and whether the user has explicitly asked for a "deep" response.
The economics: about 80% of prompts in a normal personal agent are routine, "remind me what is on my calendar today", "draft a reply to this email", "summarize the last five messages from Bob". These are perfectly handled by Claude Haiku at roughly a tenth of Sonnet's per-token cost. The other 20% are hard, "plan the rollout for the new feature given these five constraints", and those get routed to Sonnet.
The result: average per-prompt cost drops by roughly 60% with no measurable quality drop on routine work. The lever sits in your AGENTS.md as a routing rule. The exact syntax varies by version, the official docs have the current pattern.
Prompt caching savings, the free 90 percent
Anthropic's prompt cache is the most important pricing feature of the year. The mechanic: when you mark sections of your prompt as cacheable, the provider stores the tokenized representation on their side for five minutes. Subsequent requests that reuse the cached prefix pay roughly 10 percent of the normal input token cost for those tokens.
For an OpenClaw agent, the cacheable prefix is huge. SOUL.md, AGENTS.md, MEMORY.md, KNOWLEDGE.md all get re-sent on every prompt. That is often 8k to 15k tokens of stable context. Without caching, every prompt pays the full input price for those tokens. With caching, every prompt after the first one in any 5-minute window pays a tenth.
The runtime config to enable it:
provider:
anthropic:
enable_prompt_cache: true
cache_breakpoints:
- soul
- agents
- memory
- knowledge
One line per breakpoint. Enable it once and forget it. On a heartbeat-driven agent that fires 96 times a day with a 12k-token stable context, this saves roughly $20 to $30 a month on its own. There is no quality trade-off, the prompt sent to the model is exactly the same.
How do I set spending limits on an OpenClaw agent?
Two layers of spending limits. The provider-side cap is the hard one, set this in your Anthropic, OpenAI, or Gemini console. Anthropic has "Spend management" in the org settings, set a monthly maximum and the API will return 429 once you hit it. OpenAI has "Usage limits". Gemini has "Quota". Set all three even if you only use one provider regularly, the unused providers are your fallback chain so they need limits too.
The agent-side cap is softer but more granular. Set a daily and weekly spend ceiling in your AGENTS.md and the runtime tracks accumulated cost across runs. When you hit the cap, the agent surfaces a warning to its channels and pauses non-essential heartbeat work until the next window opens.
The other thing to set up before you ship: prompt caching. Anthropic's prompt cache cuts the cost of the agent's repeated context (SOUL.md, MEMORY.md, AGENTS.md, the whole .md stack the agent re-sends on every prompt) by about 90%. It is one extra header on the request, the runtime handles it, you just have to enable the flag in your config. Free money, take it.
Prompt bloat, the silent killer
The fourth lever is the one nobody thinks about until they audit a deployment six months in. The .md files in your workspace get re-sent on every prompt. Every word you let drift into MEMORY.md is paid for, every prompt, forever. A bloated MEMORY.md silently doubles your bill.
The audit pass I run on every client agent: open each .md file, count the tokens (a rough estimate is 4 characters per token, or use a tokenizer), and ask whether each section is actually used. Common culprits: stale facts about people who left the company two years ago, project notes from finished work that never got pruned, conversation summaries that the dreaming process should have folded into shorter facts but did not.
Healthy targets, in tokens: SOUL.md under 2k, AGENTS.md under 3k, MEMORY.md under 5k, KNOWLEDGE.md under 5k. If any file is bigger, ask why. The dreaming process (openclaw memory.md) helps, but it never deletes, only promotes. Manual pruning is part of operating an agent.
A real before-and-after
One client deployment, a personal assistant agent for a busy founder, was running at $73 a month before the optimization pass. Heartbeat every 5 minutes, single-tier on Claude Sonnet, no prompt cache, MEMORY.md was 14k tokens because nothing had ever been pruned.
After the pass: heartbeat at 20 minutes (still catches everything, the founder did not notice the difference), two-tier on Sonnet plus Haiku, prompt caching on, MEMORY.md curated down to 4k tokens. New monthly bill: $17. Same agent, same channels, same outputs.
The merged-PR fix at openclaw cost runaway fix covers the related cost-runaway bug I patched in core, the one where a single wedged connection could burn $20 to $30 in 60 seconds. That fix is now in core, but knowing how the cost levers work is still on you.
What goes wrong, the cost-runaway failure modes
The bills people complain about are almost always one of three failure modes. The first, wedged provider connection. A request hangs, the runtime retries, the retry hangs, the runtime retries again. Pre-2026.4 the retry loop had no circuit breaker and could fan out hundreds of paid calls in 60 seconds. The patch I shipped in PR #76345 fixed this in core. Confirm you are on a version newer than 2026.4 with openclaw --version.
The second, tool-call recursion. The agent calls a Skill, the Skill returns, the agent decides another Skill call is needed, that Skill returns, and so on. Without a depth limit, a confused agent can rack up 50+ tool calls on a single prompt. Set max_tool_calls_per_prompt: 10 in AGENTS.md as the safety rail. If the agent legitimately needs more, it can ask for the limit raised, the runtime surfaces the request to your channel.
The third, memory bloat that nobody noticed. The dreaming process promotes things into MEMORY.md but never deletes. Six months in, MEMORY.md is 18k tokens of mostly-irrelevant facts, paid on every prompt. The fix is the manual prune pass from earlier in this lesson. Run it monthly, more often if the agent's bill is creeping up unexplained.
How this connects to your full agent
Cost optimization is not a one-time pass, it is a habit. Every time you add a Skill, the manifest declares which tools and which provider it uses. Every time you edit MEMORY.md, you add tokens that get paid on every prompt forever. Every time you tighten the heartbeat interval, you 2x or 5x your bill. The four levers in this lesson are the lens.
The right next reads. Openclaw heartbeat.md on day 6 walks the heartbeat in detail, including the cost shape per use case. Openclaw memory.md on day 8 covers MEMORY.md curation and the dreaming process that helps prevent bloat. Openclaw agents.md on day 9 is where the spending caps and routing rules live.
If you only do one thing from this lesson today, enable prompt caching. The line in your config takes ten seconds, the savings start immediately, and there is no downside. Then move to heartbeat tuning. Then the two-tier routing. Bill drops by 70 percent, agent does the same work.
Key takeaways
- 01Heartbeat interval is the single biggest cost lever, doubling it usually halves the bill.
- 02Two-tier routing sends 80% of work to a cheap fast model and only the hard 20% to the smart one.
- 03Prompt caching on Anthropic cuts the cost of repeated agent context by 90%.
- 04Set a hard spending limit on every provider before you run the agent unattended.
About the instructor. Adhiraj Hangal teaches this lesson. Founder of OpenClaw Consult and one of the few consultants whose code is merged in openclaw/openclaw core. PR #76345 was reviewed and merged by project creator Peter Steinberger. Read the contribution log.
Need help shipping openclaw cost optimization in production?
OpenClaw Consult ships production-grade OpenClaw deployments for operators and founders. Founded by Adhiraj Hangal, a merged contributor to openclaw/openclaw core.
Hire an OpenClaw expert→