The difference between a $70 monthly bill and a $17 one is not quality — it’s configuration. Same agent, same tasks, same results. Just smarter about where every token goes.
Cheap model for background tasks, premium model for conversations. The single biggest cost lever.
Where most of the money actually goes — and why most people don’t realize it until the bill arrives.
System prompts and memory silently inflate every API call. Learn to audit and prune them.
Guardrails before your agent runs unattended. The cautionary tale you don’t want to live through.
Dashboard Cost tab and CLI stats. You can’t optimize what you can’t measure.
90% discount on repeated system prompts. A powerful feature most people don’t know exists.
Six concrete actions that cut your bill by over seventy percent. Do all of them today.
Configure dual-model, extend heartbeat, prune SOUL.md, check spending. Save money this week.
Most of that $70 is going to background tasks that don’t need a frontier model. Your heartbeat is doing structured, repetitive checks. It doesn’t need Opus. That’s the problem. Now let’s fix it.
Your SOUL.md and memory context get sent with every single request. A 2,000-token SOUL.md plus 1,000 tokens of memory = 3,000 overhead tokens before the model even sees your question. At 48 heartbeat cycles/day that’s 144,000 overhead tokens daily that have nothing to do with the monitoring task.
In a normal chatbot, the system prompt is a rounding error. In an agent running automated cycles every 30 minutes, it’s the main cost driver. Prompt efficiency is not a nice-to-have — it’s essential.
Claude Haiku 4.5 at $1/$5 or GPT-4o Mini at $0.15/$0.60. Handles heartbeat cycles, RSS scanning, server checks, calendar monitoring. More than enough capability for structured, repetitive tasks.
Claude Opus 4.6 or Sonnet 4.6. When you’re actively talking to your agent, you get premium reasoning. The price is worth it for real conversations. Not for checking if your server is up.
That $25.20/month heartbeat cost on Opus drops to $5.04/month on Haiku. Same checks. Same frequency. 80% savings. Add interactive savings from Sonnet vs Opus and total bill becomes a fraction of what you started with.
OpenClaw automatically routes heartbeat cycles and background tasks to secondary, and direct messages from you to primary. No manual switching.
Save the file and restart the Gateway. Every heartbeat cycle now uses your cheap model. Every direct message uses your premium one. One config change — immediate savings from this moment forward.
| Model | Cycles/Day | Input Cost/Day | Output Cost/Day | Total/Day | Total/Month |
|---|---|---|---|---|---|
| Opus 4.6 | 48 (30-min) | $0.24 | $0.60 | $0.84 | $25.20 |
| Haiku 4.5 | 48 (30-min) | $0.048 | $0.12 | $0.168 | $5.04 |
| Haiku 4.5 | 24 (60-min) | $0.024 | $0.06 | $0.084 | $2.52 |
Assumes 1,000 input tokens and 500 output tokens per cycle. Haiku at $1/$5 per million tokens.
Switching to Haiku saves 80% on heartbeat costs. Extending to 60-minute cycles saves another 50% on top of that. The quality difference for “check if my server is responding” is negligible. Haiku handles structured, repetitive monitoring perfectly.
Server uptime? An hour between checks is fine. RSS feeds? News doesn’t go stale in 30 minutes. Calendar? Your meetings aren’t reshuffling every half hour. Most users run 30-minute heartbeats because that was the default — not because their use case demands it.
Start at 60 minutes. You still get timely monitoring and you’ve halved your heartbeat costs. Move to 30 minutes only if you find a specific task that genuinely needs it.
A SOUL.md that’s grown to 3,000 tokens over time means 3,000 tokens of input on every single call. At 48 heartbeat cycles/day that’s 144,000 overhead tokens daily. On Opus: $0.72/day or $21.60/month. Just from a bloated system prompt.
Your SOUL.md is a tax on every API call. Keep it as lean as possible. The goal isn’t a short SOUL.md — it’s an efficient one. Every word should earn its token cost.
OpenClaw’s memory accumulates over time by design. But old, outdated entries still cost tokens on every call. Run openclaw memory stats to see what your agent is carrying.
Agent learned the same preference twice? Delete one. Memory doesn’t need redundancy.
Old project status, past events, preferences you’ve changed. Gone from memory = gone from every call’s cost.
“Prefers bullet-point format with technical detail and code examples” → “Prefers: bullets, technical, code examples.” Same info, fewer tokens.
Set a monthly usage limit at console.anthropic.com. When you hit it, API calls stop. Hard cap, no surprises. Start at $10–$20 while you’re learning. Raise it when you understand your actual patterns.
Set a monthly spend threshold — OpenAI will email you when you cross it, but as of 2025 these limits are alert-only and do not hard-stop API usage. Start at $10–$15 and monitor closely. Both providers offer per-project API keys with individual limits — create separate keys if running multiple agents so one runaway process can’t drain everything.
Someone set up aggressive heartbeat monitoring with Opus, accidentally created a feedback loop in their task config, and burned through 180 million tokens in a few weeks. Hundreds of dollars. A $5 spending limit would have caught it in hours.
Set the limit now. Raise it later.
| Model | Input / M tokens | Output / M tokens | Best For |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | High-value interactive tasks requiring deep reasoning |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Daily interactive use — best default for conversations |
| Claude Haiku 4.5 | $1.00 | $5.00 | ★ Heartbeat & background automation |
| GPT-4o Mini | $0.15 | $0.60 | Budget background tasks — cheaper than Haiku |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | High-volume automation — Google’s ultra-cheap option |
| Ollama (local) | $0.00 | $0.00 | Free forever — good for heartbeat if hardware allows |
Llama 3.2 via Ollama comes in 1B and 3B text models plus 11B and 90B vision variants. The 3B (default with ollama pull llama3.2) is surprisingly capable for structured monitoring tasks.
Token usage broken down by model, task type, and time period. Check weekly. Look for patterns — is one heartbeat task consuming disproportionate tokens?
Run openclaw stats --this-month for a quick summary. Total tokens, estimated cost by provider, heartbeat vs interactive breakdown.
Set up cost alerts so your agent messages you when you’ve hit 50% of your monthly budget. A mid-month warning beats a surprise bill every time.
Check costs weekly for the first month. That’s when you catch misconfiguration and unexpected usage. After that, monthly is fine — you’ll know your patterns.
cache_control markers in promptsYour system prompt is identical on every heartbeat cycle. That’s a perfect cache target. If your system prompt is 2,000 tokens and you run 48 heartbeat cycles a day, caching saves you the equivalent of tens of thousands of input tokens per day. On Opus pricing, that’s real dollars every single day.
Set agents.defaults.model.secondary to Haiku or GPT-4o Mini. Biggest single lever.
Change agents.defaults.heartbeat.every from "30m" to "60m". Half the calls, half the cost.
Cut to essential directives only. Every token you remove saves money on every single call.
openclaw memory stats — remove duplicates, condense verbose entries, delete outdated info.
$10–$20 on Anthropic and/or OpenAI dashboards. Set it now. Raise it when you know your patterns.
You can’t optimize what you don’t measure. Make it a habit for the first month.
Add secondary to your model config. Set primary to your premium model, secondary to Haiku or GPT-4o Mini. Restart the Gateway and verify both are being used via the Dashboard.
Update heartbeat.every to "60m". Confirm the new cycle timing in the Dashboard heartbeat tab.
Read your SOUL.md and trim anything verbose or redundant. Run openclaw memory stats and prune old entries. Note how many tokens you saved.
Run openclaw stats --this-month. Set a spending limit on your API provider if you haven’t. Screenshot your current cost to compare after a week of optimized config.
Your agent is running and optimized. Now it needs to live where you already are. On Day 5 we connect Telegram, WhatsApp, Discord, iMessage, Slack, and the built-in WebChat — and cover security, formatting, and which channel to reach for in every situation.