Introduction

This is Day 3 of the 16-day OpenClaw Bootcamp. Today you'll learn exactly which model to use for each type of task, compare the major model providers OpenClaw supports, and connect a fully local Ollama model to your agent so it can run for $0 on your own hardware.

One of OpenClaw's most powerful features is its model-agnostic architecture. Your agent isn't locked to any single provider. You can switch between Claude, GPT, Gemini, or a local model — or use different models for different tasks — without changing anything else about your setup.

What You'll Build Today

  • A local Ollama model running on your own machine
  • An OpenClaw agent connected to that local model
  • A working setup that can switch between cloud and local models
  • A personal framework for choosing the right model based on budget, privacy, and task type

Video Tutorial

Watch the full Day 3 video with live model comparisons and Ollama setup walkthrough:

Why Model-Agnostic Matters

Most AI tools lock you into a single provider. If OpenAI raises prices or Anthropic ships a breakthrough model, you're stuck. OpenClaw's model-agnostic architecture means you can:

  • Switch providers instantly — change one line in your config
  • Use different models for different tasks — premium for conversations, budget for background work
  • Run fully local — zero API costs, complete privacy, no internet required
  • Future-proof your setup — when a better model drops, you swap it in without rebuilding anything

Anthropic Models (Claude)

Claude is the most popular model choice for OpenClaw agents. The Claude family offers the best balance of reasoning quality, instruction following, and personality for agentic use cases.

  • Claude Opus: The most capable model — best for complex reasoning, nuanced writing, and tasks where quality matters more than speed
  • Claude Sonnet: The sweet spot — strong reasoning at a much lower cost than Opus. This is what most OpenClaw users run as their primary model
  • Claude Haiku: Fast and cheap — perfect as a secondary model for heartbeats and background tasks

OpenAI Models (GPT)

OpenAI models are well-supported in OpenClaw and offer strong performance across the board:

  • GPT-4o: OpenAI's flagship — strong at structured tasks, code generation, and tool use. Good alternative if you prefer OpenAI's ecosystem
  • GPT-4o-mini: The budget option — excellent cost efficiency for background tasks. Comparable to Claude Haiku in the secondary model role

Google Models (Gemini)

Google's Gemini models are newer to the OpenClaw ecosystem but increasingly capable:

  • Gemini Pro: Competitive with Claude Sonnet on many tasks, with generous rate limits and competitive pricing
  • Gemini Flash: Extremely fast and cheap — a strong contender for secondary model duties

Gemini's main advantage is Google's generous free tier and large context windows. If you're cost-sensitive and starting out, it's worth testing.

Lower-Cost Chinese Models

Models like DeepSeek and Qwen deserve attention for specific use cases. They're significantly cheaper than Western alternatives and, depending on the task, surprisingly capable. They're particularly strong at code generation and structured data tasks.

The trade-offs: less reliable instruction following for nuanced agentic behavior, potential data privacy considerations, and occasional censorship on sensitive topics. But for budget-conscious users or as secondary models, they're legitimate options.

Real Cost Patterns

ModelInput (per 1M tokens)Output (per 1M tokens)Best Role
Claude Opus$15.00$75.00Premium tasks only
Claude Sonnet$3.00$15.00Primary model (best value)
Claude Haiku$0.25$1.25Secondary / background
GPT-4o$2.50$10.00Primary (OpenAI users)
GPT-4o-mini$0.15$0.60Secondary / background
Ollama (local)$0.00$0.00Privacy / zero cost

The cost mistake most people make: running their premium model for everything, including background heartbeats that happen 24–48 times per day. This is why Day 4 covers the two-tier model strategy in detail.

Setting Up Ollama for Local Inference

Ollama lets you run LLMs directly on your machine — no API keys, no internet, no cost. Here's how to connect it to OpenClaw:

Step 1: Install Ollama

Download and install Ollama from the official website. It's available for macOS, Linux, and Windows. The installer handles everything.

Step 2: Pull a Model

Open your terminal and run ollama pull llama3 (or whichever model you want). This downloads the model weights to your machine. For OpenClaw, we recommend starting with llama3 (8B) for a good balance of quality and speed.

Step 3: Connect to OpenClaw

In your OpenClaw config, point the model setting to your Ollama instance. OpenClaw's interactive setup wizard can detect Ollama automatically. Once connected, test by sending a message — if your agent responds, it's working.

What Running Fully Local Means

When your agent runs on Ollama, nothing leaves your machine. No API calls, no data sent to cloud providers, no token costs. Your conversations, memory, and all processing happen entirely on your hardware. This is ideal for privacy-sensitive use cases or situations where you want zero ongoing costs.

Hardware Requirements

  • 8B models (llama3, mistral): 8GB RAM minimum, runs on most modern laptops. Expect 10–30 tokens/sec on Apple Silicon, slower on Intel.
  • 13B models: 16GB RAM recommended. Better quality, but noticeably slower without a GPU.
  • 70B+ models: Requires 64GB+ RAM or a dedicated GPU with 24GB+ VRAM. Desktop/server territory.

For most users, an 8B model on a modern MacBook is the practical sweet spot for local inference. It's not as smart as Claude Sonnet, but it's free and private.

Performance Tuning Tips

  • Close memory-heavy applications while running local models
  • Use quantized versions (Q4_K_M) for better speed with minimal quality loss
  • On Apple Silicon, Ollama automatically uses the GPU — this is why Macs are great for local AI
  • Set a longer context window only if you need it — larger context = more memory = slower inference

How to Choose the Right Model

  • Best overall quality: Claude Sonnet (primary) + Claude Haiku (secondary)
  • Best for code and structured tasks: GPT-4o or Claude Sonnet
  • Best for speed: Gemini Flash or Claude Haiku
  • Best for privacy: Ollama with a local model
  • Best for zero cost: Ollama or Google Gemini free tier
  • Best for maximum capability: Claude Opus (when quality justifies the cost)

How to Switch Models in OpenClaw

Switching models is a one-line config change. In your OpenClaw configuration file, update the model field to your desired provider and model name. Restart the gateway, and your agent is now using the new model — same memory, same soul.md, same channels. Nothing else changes.

Need Help Choosing?

Model selection depends on your specific use case, budget, and privacy requirements. OpenClaw Consult helps clients pick the right model stack and configure dual-model setups optimized for their workload.

Frequently Asked Questions

Can I use multiple models at the same time?

Yes. OpenClaw supports a primary and secondary model. The primary handles conversations, the secondary handles background tasks. You can also switch models on the fly through config changes.

Is a local model good enough for a real agent?

For simple tasks — reminders, basic Q&A, note-taking — yes. For complex reasoning, nuanced writing, or tool use, cloud models are still significantly better. Many users run local for privacy-sensitive tasks and cloud for everything else.

Which local model should I start with?

Llama 3 8B. It's the best balance of quality, speed, and memory usage for most hardware. Once you're comfortable, you can experiment with larger models or specialized ones.

What if I switch models mid-conversation?

Your memory and soul.md carry over — those are stored in files, not in the model. Conversation context resets on model switch, but long-term memory persists. The new model picks up where the old one left off.