Back to Series
    Part 6 of 6
    14 min read

    Production Hardening & Monitoring

    Deploy Agent Zero behind a Caddy reverse proxy with SSL, authentication, Docker security hardening, resource limits, and comprehensive monitoring for production workloads.

    Running Agent Zero for personal experimentation is one thing. Running it reliably for production workloads—where downtime costs you and security breaches could expose sensitive data—requires additional hardening. This guide covers reverse proxy configuration, SSL/TLS, authentication, resource management, monitoring, and operational best practices.

    1

    Reverse Proxy with Caddy

    Exposing Agent Zero directly on port 5000 works for testing, but production deployments should sit behind a reverse proxy. Caddy handles SSL certificates automatically and provides a clean security boundary.

    Install Caddy

    Install Caddy on Ubuntu
    sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
    curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
    curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
    sudo apt update
    sudo apt install caddy -y

    Configure Caddy with Automatic HTTPS

    Create or edit /etc/caddy/Caddyfile:

    /etc/caddy/Caddyfile
    agent.yourdomain.com {
        reverse_proxy localhost:5000
        
        encode gzip
        
        header {
            # Security headers
            Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
            X-Content-Type-Options "nosniff"
            X-Frame-Options "DENY"
            Referrer-Policy "strict-origin-when-cross-origin"
            
            # Remove server identification
            -Server
        }
        
        log {
            output file /var/log/caddy/agent-zero.log
            format json
        }
    }

    Replace agent.yourdomain.com with your actual domain. Ensure DNS points to your server's IP.

    Restrict Agent Zero to Localhost

    Now that Caddy handles external traffic, bind Agent Zero only to localhost. Edit .env:

    ~/agent-zero/.env
    WEB_UI_HOST=127.0.0.1
    WEB_UI_PORT=5000

    Restart both services:

    sudo systemctl restart agent-zero
    sudo systemctl restart caddy

    Update Firewall

    Remove direct access to port 5000 and ensure HTTP/HTTPS are open:

    sudo ufw delete allow 5000/tcp
    sudo ufw allow 80/tcp
    sudo ufw allow 443/tcp
    sudo ufw reload
    
    # Verify HTTPS works
    curl -I https://agent.yourdomain.com
    2

    Authentication Setup

    Agent Zero doesn't include built-in authentication. Add a layer through Caddy.

    Basic Authentication

    Generate a password hash:

    caddy hash-password

    Enter your desired password when prompted. Copy the output hash.

    Update the Caddyfile:

    /etc/caddy/Caddyfile with Basic Auth
    agent.yourdomain.com {
        basicauth {
            admin $2a$14$your-hashed-password-here
        }
        
        reverse_proxy localhost:5000
        
        encode gzip
        
        header {
            Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
            X-Content-Type-Options "nosniff"
            X-Frame-Options "DENY"
            Referrer-Policy "strict-origin-when-cross-origin"
            -Server
        }
        
        log {
            output file /var/log/caddy/agent-zero.log
            format json
        }
    }
    sudo systemctl reload caddy

    IP Allowlisting

    For maximum security, restrict access to specific IP addresses:

    IP-restricted Caddyfile
    agent.yourdomain.com {
        @blocked not remote_ip 203.0.113.0/24 198.51.100.50
        respond @blocked "Access denied" 403
        
        reverse_proxy localhost:5000
        
        # ... rest of config
    }

    Combining Methods

    Use both authentication and IP restrictions:

    Combined Authentication + IP Restriction
    agent.yourdomain.com {
        @allowed remote_ip 203.0.113.0/24 198.51.100.50
        
        handle @allowed {
            basicauth {
                admin $2a$14$your-hashed-password-here
            }
            reverse_proxy localhost:5000
        }
        
        handle {
            respond "Access denied" 403
        }
        
        # ... headers and logging
    }
    3

    Docker Security Hardening

    Agent Zero executes arbitrary code inside Docker containers. Harden this execution environment.

    Resource Limits (Daemon-Level)

    Prevent runaway containers from consuming all system resources:

    /etc/docker/daemon.json
    {
      "default-ulimits": {
        "nofile": {
          "Name": "nofile",
          "Hard": 65536,
          "Soft": 65536
        },
        "nproc": {
          "Name": "nproc",
          "Hard": 4096,
          "Soft": 4096
        }
      },
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m",
        "max-file": "3"
      }
    }
    sudo systemctl restart docker

    Container-Specific Limits

    Configure resource limits for the Agent Zero execution container in .env:

    ~/agent-zero/.env
    # Docker Resource Limits
    CODE_EXEC_DOCKER_CPU_LIMIT=2.0
    CODE_EXEC_DOCKER_MEMORY_LIMIT=2g
    CODE_EXEC_DOCKER_PIDS_LIMIT=256

    Network Isolation

    For sensitive environments, restrict container network access:

    # Create isolated network
    docker network create --internal agent-zero-internal
    
    # Configure in .env
    CODE_EXEC_DOCKER_NETWORK=agent-zero-internal

    With --internal, containers can communicate with each other but cannot reach the internet.

    Read-Only Root Filesystem

    For additional hardening:

    ~/agent-zero/.env
    CODE_EXEC_DOCKER_READ_ONLY=true
    CODE_EXEC_DOCKER_TMPFS=/tmp:rw,size=512m
    4

    System Resource Management

    Memory Limits for Agent Zero

    Add memory limits to the systemd service:

    sudo systemctl edit agent-zero
    [Service]
    MemoryMax=4G
    MemoryHigh=3G

    Swap Configuration

    Ensure adequate swap for memory-intensive operations:

    # Check current swap
    free -h
    
    # Create 4GB swap file if needed
    sudo fallocate -l 4G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    
    # Make permanent
    echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

    Process Limits

    Prevent fork bombs and runaway processes:

    /etc/security/limits.d/agent-zero.conf
    agentzero soft nproc 256
    agentzero hard nproc 512
    agentzero soft nofile 65536
    agentzero hard nofile 65536
    5

    Monitoring with Netdata

    Netdata provides real-time monitoring with minimal configuration:

    curl https://get.netdata.cloud/kickstart.sh | bash

    Access the dashboard at http://your-server-ip:19999. Configure Caddy to proxy it securely:

    Add to Caddyfile
    monitoring.yourdomain.com {
        basicauth {
            admin $2a$14$your-hashed-password-here
        }
        
        reverse_proxy localhost:19999
    }

    What Netdata Monitors

    • CPU, memory, disk, and network usage
    • Docker container metrics
    • Process-level resource consumption
    • System health alerts
    6

    Logging & Health Checks

    Structured Application Logging

    ~/agent-zero/logging_config.json
    {
      "version": 1,
      "disable_existing_loggers": false,
      "formatters": {
        "json": {
          "class": "pythonjsonlogger.jsonlogger.JsonFormatter",
          "format": "%(asctime)s %(name)s %(levelname)s %(message)s"
        }
      },
      "handlers": {
        "file": {
          "class": "logging.handlers.RotatingFileHandler",
          "filename": "/var/log/agent-zero/app.log",
          "maxBytes": 104857600,
          "backupCount": 5,
          "formatter": "json"
        }
      },
      "root": {
        "level": "INFO",
        "handlers": ["file"]
      }
    }
    sudo mkdir -p /var/log/agent-zero
    sudo chown agentzero:agentzero /var/log/agent-zero
    
    cd ~/agent-zero
    source venv/bin/activate
    pip install python-json-logger

    Log Rotation

    /etc/logrotate.d/agent-zero
    /var/log/agent-zero/*.log {
        daily
        rotate 14
        compress
        delaycompress
        missingok
        notifempty
        create 640 agentzero agentzero
        postrotate
            systemctl reload agent-zero > /dev/null 2>&1 || true
        endscript
    }
    
    /var/log/caddy/*.log {
        daily
        rotate 14
        compress
        delaycompress
        missingok
        notifempty
        create 640 caddy caddy
        postrotate
            systemctl reload caddy > /dev/null 2>&1 || true
        endscript
    }

    Health Check Script

    ~/agent-zero/healthcheck.sh
    #!/bin/bash
    
    ENDPOINT="http://localhost:5000"
    WEBHOOK_URL="https://your-alerting-service/webhook"  # Optional
    
    response=$(curl -s -o /dev/null -w "%{http_code}" $ENDPOINT)
    
    if [ "$response" != "200" ]; then
        echo "$(date): Agent Zero health check failed with status $response" >> /var/log/agent-zero/healthcheck.log
        
        # Attempt restart
        sudo systemctl restart agent-zero
        
        # Optional: Send alert
        # curl -X POST -H "Content-Type: application/json" -d '{"text":"Agent Zero health check failed"}' $WEBHOOK_URL
    fi
    chmod +x ~/agent-zero/healthcheck.sh
    
    # Add to crontab (run every 5 minutes)
    crontab -e
    # Add: */5 * * * * /home/agentzero/agent-zero/healthcheck.sh
    7

    Backup Strategy

    Comprehensive Backup Script

    ~/agent-zero/backup-full.sh
    #!/bin/bash
    set -e
    
    BACKUP_DIR="/home/agentzero/backups"
    DATE=$(date +%Y%m%d_%H%M%S)
    BACKUP_NAME="agent-zero-full_$DATE"
    
    mkdir -p $BACKUP_DIR
    
    echo "Starting full backup: $BACKUP_NAME"
    
    # Stop service for consistent backup
    sudo systemctl stop agent-zero
    
    # Backup memory, knowledge, and configuration
    tar -czf $BACKUP_DIR/$BACKUP_NAME.tar.gz \
        -C /home/agentzero/agent-zero \
        memory/ \
        knowledge/ \
        .env \
        prompts/ \
        --exclude='*.pyc' \
        --exclude='__pycache__'
    
    # Restart service
    sudo systemctl start agent-zero
    
    # Calculate checksum
    sha256sum $BACKUP_DIR/$BACKUP_NAME.tar.gz > $BACKUP_DIR/$BACKUP_NAME.sha256
    
    # Cleanup old backups (keep 30 days)
    find $BACKUP_DIR -name "agent-zero-full_*.tar.gz" -mtime +30 -delete
    find $BACKUP_DIR -name "agent-zero-full_*.sha256" -mtime +30 -delete
    
    echo "Backup completed: $BACKUP_NAME.tar.gz"
    echo "Size: $(du -h $BACKUP_DIR/$BACKUP_NAME.tar.gz | cut -f1)"
    chmod +x ~/agent-zero/backup-full.sh
    
    # Schedule daily backup at 3 AM
    crontab -e
    # Add: 0 3 * * * /home/agentzero/agent-zero/backup-full.sh >> /var/log/agent-zero/backup.log 2>&1

    Off-Site Backup

    # Using rclone (supports S3, Backblaze, Google Cloud, etc.)
    sudo apt install rclone -y
    rclone config  # Follow prompts to configure your remote
    
    # Add to backup script
    rclone copy $BACKUP_DIR/$BACKUP_NAME.tar.gz remote:agent-zero-backups/
    8

    Update Procedures

    Updating Agent Zero

    cd ~/agent-zero
    
    # Create backup first
    ./backup-full.sh
    
    # Stop service
    sudo systemctl stop agent-zero
    
    # Pull latest code
    git fetch origin
    git checkout main
    git pull origin main
    
    # Update dependencies
    source venv/bin/activate
    pip install -r requirements.txt --upgrade
    
    # Review changelog for breaking changes
    cat CHANGELOG.md | head -100
    
    # Start service
    sudo systemctl start agent-zero
    
    # Verify functionality
    curl -I http://localhost:5000

    Updating Docker Images

    # Pull latest execution image
    docker pull frdel/agent-zero-exe:latest
    
    # Remove old container (Agent Zero will recreate it)
    docker stop agent-zero-exe
    docker rm agent-zero-exe
    
    # Restart Agent Zero to use new image
    sudo systemctl restart agent-zero

    Updating Ollama Models

    # Update Ollama itself
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Update models
    ollama pull qwen2.5:7b
    ollama pull nomic-embed-text
    
    # Verify
    ollama list
    9

    Troubleshooting Production Issues

    High Memory Usage

    # Check what's consuming memory
    ps aux --sort=-%mem | head -20
    
    # Check Docker containers
    docker stats --no-stream
    
    # If Ollama is the culprit, reduce loaded models
    curl http://localhost:11434/api/generate -d '{"model": "qwen2.5:7b", "keep_alive": 0}'

    Slow Response Times

    # Check system load
    uptime
    top -bn1 | head -20
    
    # Check if swapping
    vmstat 1 5
    
    # Check Docker container health
    docker logs agent-zero-exe --tail 100
    
    # Check Ollama inference speed
    time ollama run qwen2.5:7b "Hello" --verbose

    Service Won't Start

    # Check service status
    sudo systemctl status agent-zero
    
    # View recent logs
    journalctl -u agent-zero -n 100 --no-pager
    
    # Check for port conflicts
    sudo ss -tlnp | grep 5000
    
    # Verify permissions
    ls -la ~/agent-zero/.env
    ls -la ~/agent-zero/memory/

    Docker Container Issues

    # List all containers including stopped
    docker ps -a
    
    # Check container logs
    docker logs agent-zero-exe --tail 200
    
    # Inspect container
    docker inspect agent-zero-exe
    
    # Force recreate
    docker stop agent-zero-exe
    docker rm agent-zero-exe
    sudo systemctl restart agent-zero
    10

    Production Checklist

    Before going live, verify:

    Reverse proxy configured with valid SSL certificate
    Authentication enabled (basic auth or IP restriction)
    Agent Zero bound to localhost only
    Firewall rules allow only necessary ports
    Docker resource limits configured
    System swap configured
    Log rotation enabled
    Automated backups running and tested
    Health check monitoring active
    API keys have spending limits set
    Memory and knowledge directories backed up
    Update procedure documented and tested

    Series Complete!

    Over this six-part series, you've built a production-ready Agent Zero deployment:

    1. Environment Setup — Secured Ubuntu server with hardened SSH, firewall, and fail2ban
    2. Core Deployment — Docker installation and Agent Zero framework setup
    3. LLM Configuration — Cloud providers and local Ollama models
    4. Memory & Knowledge — Persistent learning with private web search
    5. Multi-Agent Systems — Agent hierarchies and custom tools
    6. Production Hardening — Reverse proxy, authentication, monitoring, and operations

    Your self-hosted AI agent infrastructure now runs independently of third-party services, with the privacy, control, and reliability that cloud AI platforms cannot provide.