In This Article
- 01Introduction
- 02Video Tutorial
- 03What an Unoptimized Deployment Costs
- 04Understanding Tokens: Input vs Output
- 05The Two-Tier Model Strategy
- 06The Heartbeat Math
- 07Prompt Bloat: The Invisible Cost Driver
- 08Setting Spending Limits
- 09Before vs After: $70 to $17
- 10Prompt Caching: The Advanced Move
- 11Action Checklist
- 12Frequently Asked Questions
Introduction
This is Day 4 of the 16-day OpenClaw Bootcamp. Today we tackle the question everyone asks after their first week: "Why is my API bill so high?" The answer is almost always the same — your heartbeat is too frequent, your prompts are too bloated, and you're using your premium model for everything.
By the end of this session, you'll have a dual-model config, a pruned soul.md, and spending limits in place. Most users see their monthly costs drop from ~$70 to ~$17 after making these changes.
What You'll Build Today
- A dual-model config with a premium model for conversations and a budget model for background tasks
- A pruned soul.md that stops bleeding tokens on every API call
- Spending limits and alerts so your agent never runs away with your budget
- A baseline cost snapshot to measure your savings after a week
Video Tutorial
Watch the full Day 4 video with real cost breakdowns and live config changes:
What an Unoptimized Deployment Costs
A typical unoptimized OpenClaw setup running Claude 3.5 Sonnet with a 30-minute heartbeat and a verbose soul.md costs roughly $70/month in API fees. That's not because OpenClaw is expensive — it's because LLM tokens add up fast when your agent runs 24/7 and you haven't tuned the defaults.
The three biggest cost drivers:
- Heartbeat frequency — every cycle sends your full system prompt to the model
- Prompt size — a bloated soul.md means more input tokens on every single call
- Model choice — using your premium model for background tasks that don't need it
Understanding Tokens: Input vs Output
LLM pricing has two components: input tokens (what you send to the model) and output tokens (what the model generates). Input tokens are cheaper but add up because your system prompt, memory, and conversation history are sent on every call. Output tokens are more expensive per-token but there are fewer of them.
For a running OpenClaw agent, input tokens are typically 80–90% of your bill. That's why reducing prompt size has such a massive impact.
The Two-Tier Model Strategy
The single most impactful optimization: use a premium model (Claude Sonnet, GPT-4o) for conversations and a budget model (Claude Haiku, GPT-4o-mini) for background tasks like heartbeats and memory management.
In your OpenClaw config, set:
- Primary model: Your best model — handles direct conversations with you
- Secondary model: A cheaper model — handles heartbeats, memory compaction, and other automated tasks
The quality difference for background tasks is negligible. Your agent doesn't need Claude Sonnet to check if it has any scheduled reminders. But the cost difference is 5–10x.
The Heartbeat Math
The heartbeat is OpenClaw's background processing cycle — it's how your agent "thinks" when you're not talking to it. Each heartbeat sends the full system prompt to the model.
The math with a 30-minute heartbeat:
- 48 heartbeats/day × ~2,000 input tokens = 96,000 tokens/day
- At Claude Sonnet pricing: ~$0.29/day = $8.64/month just for heartbeats
Switch to a 60-minute heartbeat:
- 24 heartbeats/day × ~2,000 input tokens = 48,000 tokens/day
- At Claude Haiku pricing (secondary model): ~$0.01/day = $0.36/month
That one change — extending the interval and using a budget model — saves you $8.28/month on heartbeats alone.
Prompt Bloat: The Invisible Cost Driver
Your soul.md is included in every API call. If it's 3,000 tokens when it could be 800, you're paying 3.75x more than necessary on every single interaction.
How to audit:
- Open your soul.md and read every line critically
- Remove placeholder text, redundant instructions, and examples your agent has already learned
- Consolidate similar instructions into concise statements
- Move context that doesn't need to be in every call to memory files instead
A well-pruned soul.md is usually 500–1,000 tokens. If yours is over 2,000, there's almost certainly bloat to cut.
Setting Spending Limits
Always set spending limits before leaving your agent running unattended:
- Anthropic: Hard spending cap — your API key stops working when the limit is hit. Set this to your absolute maximum.
- OpenAI: Alert-only threshold — you get an email but the API keeps working. Set this lower than your true limit as an early warning.
A good starting point: set your monthly limit to 2x what you expect to spend. This gives you headroom for experimentation while catching runaway costs.
Before vs After: $70 to $17
| Change | Before | After |
|---|---|---|
| Heartbeat interval | 30 min | 60 min |
| Heartbeat model | Claude Sonnet | Claude Haiku |
| soul.md tokens | ~3,000 | ~800 |
| Prompt caching | Off | On |
| Monthly cost | ~$70.80 | ~$17.00 |
Prompt Caching: The Advanced Move
Anthropic's prompt caching gives you a 90% discount on repeated input tokens. Since your system prompt, soul.md, and memory are largely the same on every call, caching can dramatically reduce costs on the input side.
How it works: the first call with a given prompt prefix is full price. Subsequent calls that share the same prefix get cached tokens at 10% of the normal rate. Since your system prompt is the same on every call, this is essentially free after the first message.
Action Checklist
- Set up a secondary (budget) model in your OpenClaw config
- Extend your heartbeat interval to 60 minutes
- Audit and prune your soul.md — target under 1,000 tokens
- Enable prompt caching if using Anthropic
- Set spending limits on your API provider
- Take a cost snapshot now — check again in one week to measure savings
Want Expert Cost Optimization?
If you're running OpenClaw for a business and want to squeeze every dollar, OpenClaw Consult does cost audits where we analyze your actual token usage and build an optimized config. Most clients see 60–80% cost reductions.
Frequently Asked Questions
Will using a cheaper model make my agent dumber?
Not for background tasks. The secondary model only handles heartbeats and memory management — tasks where the difference between Claude Sonnet and Claude Haiku is negligible. Your conversations still use the premium model.
How do I know if my soul.md is bloated?
If it's over 2,000 tokens, it's bloated. Paste it into a token counter (or the OpenClaw dashboard) and read every line. If a line isn't changing your agent's behavior, cut it.
Is a 60-minute heartbeat too slow?
For most users, no. The heartbeat handles background tasks like checking reminders and compacting memory. Unless you need your agent to proactively check something every 30 minutes, 60 minutes is fine. You can always set it shorter for specific use cases.
Does prompt caching work with OpenAI?
OpenAI has a similar caching mechanism but it's less aggressive than Anthropic's. The two-tier model strategy works with both providers regardless of caching support.