Part 4 of 6

LLM Configuration and Model Strategy

Direct API, unified routing via OpenRouter, or fully local inference with Ollama — choose based on cost, privacy, and capability.

25 minutes

3 provider options

Choosing a Model Strategy

Maximum capability. Use Claude or GPT-4o via direct API or OpenRouter. Best for complex instructions and prompt injection resistance.

Cost efficiency at volume. Use OpenRouter with model failover. Route simple tasks to cheap models, complex tasks to heavy ones.

Full data privacy. Use Ollama locally. No API calls leave your VPS. Tradeoff: less capable and needs more RAM (8 GB+ for 7B models).

Flexibility. Use OpenRouter. One API key, one billing dashboard, 300+ models. Swap models without reconfiguring OpenClaw.

Option A: Direct Anthropic Connection

The simplest configuration and the recommended starting point. Sign up at console.anthropic.com and set a monthly spending limit immediately.

Configure Anthropic

openclaw config set agent.provider anthropic
openclaw config set agent.api_key "sk-ant-your-key-here"
openclaw config set agent.model claude-sonnet-4-5

Available Models

Model	Best For	Cost
claude-opus-4-5	Complex reasoning, autonomous task chains	High
claude-sonnet-4-5	Everyday tasks, balanced speed and capability	Medium
claude-haiku-4-5	High-volume simple tasks, fast responses	Low

Verify

openclaw config verify
openclaw gateway restart

Option B: OpenRouter as a Unified Layer

OpenRouter accepts a single API key and routes requests to the model you specify. One billing account, automatic failover, and access to 300+ models.

Configure OpenRouter

openclaw config set agent.provider openrouter
openclaw config set agent.api_key "sk-or-your-key-here"
openclaw config set agent.model anthropic/claude-sonnet-4-5

Configure Model Failover

Failover configuration

openclaw config set agent.model anthropic/claude-sonnet-4-5
openclaw config set agent.fallback_model openai/gpt-4o
openclaw config set agent.fallback_on_error true

Route Different Agents to Different Models

Per-agent models

# High-capability agent for complex work
openclaw agents update work-agent --model anthropic/claude-opus-4-5

# Lightweight agent for simple tasks
openclaw agents update reminder-agent --model google/gemini-flash-1.5

Option C: Ollama for Local Inference

Ollama serves open-weight models locally. No API costs, no data leaving the server.

System Requirements

Model Size	Min RAM	Recommended
7B params	8 GB	12 GB
13B params	16 GB	20 GB
33B params	32 GB	48 GB
70B params	64 GB	80 GB

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2:3b       # 2 GB, fits in 4 GB RAM
ollama pull qwen2.5-coder:7b  # 4.7 GB, fits in 8 GB RAM

# Test the model
ollama run llama3.2:3b "Hello, are you working?"

Secure Ollama

Restrict Ollama to localhost:

systemd override

sudo systemctl edit ollama

# Add to the [Service] section:
[Service]
Environment="OLLAMA_HOST=127.0.0.1:11434"

sudo systemctl daemon-reload
sudo systemctl restart ollama

Configure OpenClaw for Ollama

openclaw config set agent.provider ollama
openclaw config set agent.base_url "http://127.0.0.1:11434"
openclaw config set agent.model qwen2.5-coder:7b

openclaw config verify

Ollama on a Separate GPU Server

For faster inference, run Ollama on a GPU server (Vast.ai, Lambda Labs, Hetzner GPU) and point your VPS at it:

Remote Ollama

# On the GPU server
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

# On your OpenClaw VPS
openclaw config set agent.provider ollama
openclaw config set agent.base_url "http://gpu-server-ip:11434"
openclaw config set agent.model llama3.3:70b

Add your VPS IP to the GPU server's firewall. Never expose Ollama to the public internet without authentication.

Session Pruning and Context Management

Context configuration

# Maximum context tokens before pruning
openclaw config set agent.max_context_tokens 100000

# Strategy: 'sliding' keeps recent, 'summarize' compresses old context
openclaw config set agent.context_strategy sliding

The summarize strategy is more expensive but produces more coherent long-running memory.

Model Verification Checklist

Verify everything

# Run doctor for overall health
openclaw doctor

# Verify model connection
openclaw config verify

# Check channel-agent assignments
openclaw channels list
openclaw agents list

Every channel should show connected, and every channel should have an agent assigned. Send a test message on each channel and confirm responses arrive.

Part 3: Channels Part 5: Skills & Automation