OpenClaw on Your VPS Series
    Part 4 of 6

    LLM Configuration and Model Strategy

    Direct API, unified routing via OpenRouter, or fully local inference with Ollama — choose based on cost, privacy, and capability.

    25 minutes
    3 provider options

    Choosing a Model Strategy

    Maximum capability. Use Claude or GPT-4o via direct API or OpenRouter. Best for complex instructions and prompt injection resistance.

    Cost efficiency at volume. Use OpenRouter with model failover. Route simple tasks to cheap models, complex tasks to heavy ones.

    Full data privacy. Use Ollama locally. No API calls leave your VPS. Tradeoff: less capable and needs more RAM (8 GB+ for 7B models).

    Flexibility. Use OpenRouter. One API key, one billing dashboard, 300+ models. Swap models without reconfiguring OpenClaw.

    Option A: Direct Anthropic Connection

    The simplest configuration and the recommended starting point. Sign up at console.anthropic.com and set a monthly spending limit immediately.

    Configure Anthropic
    openclaw config set agent.provider anthropic
    openclaw config set agent.api_key "sk-ant-your-key-here"
    openclaw config set agent.model claude-sonnet-4-5

    Available Models

    ModelBest ForCost
    claude-opus-4-5Complex reasoning, autonomous task chainsHigh
    claude-sonnet-4-5Everyday tasks, balanced speed and capabilityMedium
    claude-haiku-4-5High-volume simple tasks, fast responsesLow
    Verify
    openclaw config verify
    openclaw gateway restart

    Option B: OpenRouter as a Unified Layer

    OpenRouter accepts a single API key and routes requests to the model you specify. One billing account, automatic failover, and access to 300+ models.

    Configure OpenRouter
    openclaw config set agent.provider openrouter
    openclaw config set agent.api_key "sk-or-your-key-here"
    openclaw config set agent.model anthropic/claude-sonnet-4-5

    Configure Model Failover

    Failover configuration
    openclaw config set agent.model anthropic/claude-sonnet-4-5
    openclaw config set agent.fallback_model openai/gpt-4o
    openclaw config set agent.fallback_on_error true

    Route Different Agents to Different Models

    Per-agent models
    # High-capability agent for complex work
    openclaw agents update work-agent --model anthropic/claude-opus-4-5
    
    # Lightweight agent for simple tasks
    openclaw agents update reminder-agent --model google/gemini-flash-1.5

    Option C: Ollama for Local Inference

    Ollama serves open-weight models locally. No API costs, no data leaving the server.

    System Requirements

    Model SizeMin RAMRecommended
    7B params8 GB12 GB
    13B params16 GB20 GB
    33B params32 GB48 GB
    70B params64 GB80 GB
    Install Ollama
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Pull a model
    ollama pull llama3.2:3b       # 2 GB, fits in 4 GB RAM
    ollama pull qwen2.5-coder:7b  # 4.7 GB, fits in 8 GB RAM
    
    # Test the model
    ollama run llama3.2:3b "Hello, are you working?"

    Secure Ollama

    Restrict Ollama to localhost:

    systemd override
    sudo systemctl edit ollama
    
    # Add to the [Service] section:
    [Service]
    Environment="OLLAMA_HOST=127.0.0.1:11434"
    
    sudo systemctl daemon-reload
    sudo systemctl restart ollama
    Configure OpenClaw for Ollama
    openclaw config set agent.provider ollama
    openclaw config set agent.base_url "http://127.0.0.1:11434"
    openclaw config set agent.model qwen2.5-coder:7b
    
    openclaw config verify

    Ollama on a Separate GPU Server

    For faster inference, run Ollama on a GPU server (Vast.ai, Lambda Labs, Hetzner GPU) and point your VPS at it:

    Remote Ollama
    # On the GPU server
    export OLLAMA_HOST=0.0.0.0:11434
    ollama serve
    
    # On your OpenClaw VPS
    openclaw config set agent.provider ollama
    openclaw config set agent.base_url "http://gpu-server-ip:11434"
    openclaw config set agent.model llama3.3:70b

    Add your VPS IP to the GPU server's firewall. Never expose Ollama to the public internet without authentication.

    Session Pruning and Context Management

    Context configuration
    # Maximum context tokens before pruning
    openclaw config set agent.max_context_tokens 100000
    
    # Strategy: 'sliding' keeps recent, 'summarize' compresses old context
    openclaw config set agent.context_strategy sliding

    The summarize strategy is more expensive but produces more coherent long-running memory.

    Model Verification Checklist

    Verify everything
    # Run doctor for overall health
    openclaw doctor
    
    # Verify model connection
    openclaw config verify
    
    # Check channel-agent assignments
    openclaw channels list
    openclaw agents list

    Every channel should show connected, and every channel should have an agent assigned. Send a test message on each channel and confirm responses arrive.