Introduction

This is Day 4 of the 16-day OpenClaw Bootcamp. Today we tackle the question everyone asks after their first week: "Why is my API bill so high?" The answer is almost always the same — your heartbeat is too frequent, your prompts are too bloated, and you're using your premium model for everything.

By the end of this session, you'll have a dual-model config, a pruned soul.md, and spending limits in place. Most users see their monthly costs drop from ~$70 to ~$17 after making these changes.

What You'll Build Today

  • A dual-model config with a premium model for conversations and a budget model for background tasks
  • A pruned soul.md that stops bleeding tokens on every API call
  • Spending limits and alerts so your agent never runs away with your budget
  • A baseline cost snapshot to measure your savings after a week

Video Tutorial

Watch the full Day 4 video with real cost breakdowns and live config changes:

What an Unoptimized Deployment Costs

A typical unoptimized OpenClaw setup running Claude 3.5 Sonnet with a 30-minute heartbeat and a verbose soul.md costs roughly $70/month in API fees. That's not because OpenClaw is expensive — it's because LLM tokens add up fast when your agent runs 24/7 and you haven't tuned the defaults.

The three biggest cost drivers:

  1. Heartbeat frequency — every cycle sends your full system prompt to the model
  2. Prompt size — a bloated soul.md means more input tokens on every single call
  3. Model choice — using your premium model for background tasks that don't need it

Understanding Tokens: Input vs Output

LLM pricing has two components: input tokens (what you send to the model) and output tokens (what the model generates). Input tokens are cheaper but add up because your system prompt, memory, and conversation history are sent on every call. Output tokens are more expensive per-token but there are fewer of them.

For a running OpenClaw agent, input tokens are typically 80–90% of your bill. That's why reducing prompt size has such a massive impact.

The Two-Tier Model Strategy

The single most impactful optimization: use a premium model (Claude Sonnet, GPT-4o) for conversations and a budget model (Claude Haiku, GPT-4o-mini) for background tasks like heartbeats and memory management.

In your OpenClaw config, set:

  • Primary model: Your best model — handles direct conversations with you
  • Secondary model: A cheaper model — handles heartbeats, memory compaction, and other automated tasks

The quality difference for background tasks is negligible. Your agent doesn't need Claude Sonnet to check if it has any scheduled reminders. But the cost difference is 5–10x.

The Heartbeat Math

The heartbeat is OpenClaw's background processing cycle — it's how your agent "thinks" when you're not talking to it. Each heartbeat sends the full system prompt to the model.

The math with a 30-minute heartbeat:

  • 48 heartbeats/day × ~2,000 input tokens = 96,000 tokens/day
  • At Claude Sonnet pricing: ~$0.29/day = $8.64/month just for heartbeats

Switch to a 60-minute heartbeat:

  • 24 heartbeats/day × ~2,000 input tokens = 48,000 tokens/day
  • At Claude Haiku pricing (secondary model): ~$0.01/day = $0.36/month

That one change — extending the interval and using a budget model — saves you $8.28/month on heartbeats alone.

Prompt Bloat: The Invisible Cost Driver

Your soul.md is included in every API call. If it's 3,000 tokens when it could be 800, you're paying 3.75x more than necessary on every single interaction.

How to audit:

  • Open your soul.md and read every line critically
  • Remove placeholder text, redundant instructions, and examples your agent has already learned
  • Consolidate similar instructions into concise statements
  • Move context that doesn't need to be in every call to memory files instead

A well-pruned soul.md is usually 500–1,000 tokens. If yours is over 2,000, there's almost certainly bloat to cut.

Setting Spending Limits

Always set spending limits before leaving your agent running unattended:

  • Anthropic: Hard spending cap — your API key stops working when the limit is hit. Set this to your absolute maximum.
  • OpenAI: Alert-only threshold — you get an email but the API keeps working. Set this lower than your true limit as an early warning.

A good starting point: set your monthly limit to 2x what you expect to spend. This gives you headroom for experimentation while catching runaway costs.

Before vs After: $70 to $17

ChangeBeforeAfter
Heartbeat interval30 min60 min
Heartbeat modelClaude SonnetClaude Haiku
soul.md tokens~3,000~800
Prompt cachingOffOn
Monthly cost~$70.80~$17.00

Prompt Caching: The Advanced Move

Anthropic's prompt caching gives you a 90% discount on repeated input tokens. Since your system prompt, soul.md, and memory are largely the same on every call, caching can dramatically reduce costs on the input side.

How it works: the first call with a given prompt prefix is full price. Subsequent calls that share the same prefix get cached tokens at 10% of the normal rate. Since your system prompt is the same on every call, this is essentially free after the first message.

Action Checklist

  1. Set up a secondary (budget) model in your OpenClaw config
  2. Extend your heartbeat interval to 60 minutes
  3. Audit and prune your soul.md — target under 1,000 tokens
  4. Enable prompt caching if using Anthropic
  5. Set spending limits on your API provider
  6. Take a cost snapshot now — check again in one week to measure savings

Want Expert Cost Optimization?

If you're running OpenClaw for a business and want to squeeze every dollar, OpenClaw Consult does cost audits where we analyze your actual token usage and build an optimized config. Most clients see 60–80% cost reductions.

Frequently Asked Questions

Will using a cheaper model make my agent dumber?

Not for background tasks. The secondary model only handles heartbeats and memory management — tasks where the difference between Claude Sonnet and Claude Haiku is negligible. Your conversations still use the premium model.

How do I know if my soul.md is bloated?

If it's over 2,000 tokens, it's bloated. Paste it into a token counter (or the OpenClaw dashboard) and read every line. If a line isn't changing your agent's behavior, cut it.

Is a 60-minute heartbeat too slow?

For most users, no. The heartbeat handles background tasks like checking reminders and compacting memory. Unless you need your agent to proactively check something every 30 minutes, 60 minutes is fine. You can always set it shorter for specific use cases.

Does prompt caching work with OpenAI?

OpenAI has a similar caching mechanism but it's less aggressive than Anthropic's. The two-tier model strategy works with both providers regardless of caching support.