Production Hardening & Monitoring
Deploy Agent Zero behind a Caddy reverse proxy with SSL, authentication, Docker security hardening, resource limits, and comprehensive monitoring for production workloads.
Running Agent Zero for personal experimentation is one thing. Running it reliably for production workloads—where downtime costs you and security breaches could expose sensitive data—requires additional hardening. This guide covers reverse proxy configuration, SSL/TLS, authentication, resource management, monitoring, and operational best practices.
Reverse Proxy with Caddy
Exposing Agent Zero directly on port 5000 works for testing, but production deployments should sit behind a reverse proxy. Caddy handles SSL certificates automatically and provides a clean security boundary.
Install Caddy
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy -yConfigure Caddy with Automatic HTTPS
Create or edit /etc/caddy/Caddyfile:
agent.yourdomain.com {
reverse_proxy localhost:5000
encode gzip
header {
# Security headers
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
# Remove server identification
-Server
}
log {
output file /var/log/caddy/agent-zero.log
format json
}
}Replace agent.yourdomain.com with your actual domain. Ensure DNS points to your server's IP.
Restrict Agent Zero to Localhost
Now that Caddy handles external traffic, bind Agent Zero only to localhost. Edit .env:
WEB_UI_HOST=127.0.0.1
WEB_UI_PORT=5000Restart both services:
sudo systemctl restart agent-zero
sudo systemctl restart caddyUpdate Firewall
Remove direct access to port 5000 and ensure HTTP/HTTPS are open:
sudo ufw delete allow 5000/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw reload
# Verify HTTPS works
curl -I https://agent.yourdomain.comAuthentication Setup
Agent Zero doesn't include built-in authentication. Add a layer through Caddy.
Basic Authentication
Generate a password hash:
caddy hash-passwordEnter your desired password when prompted. Copy the output hash.
Update the Caddyfile:
agent.yourdomain.com {
basicauth {
admin $2a$14$your-hashed-password-here
}
reverse_proxy localhost:5000
encode gzip
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
-Server
}
log {
output file /var/log/caddy/agent-zero.log
format json
}
}sudo systemctl reload caddyIP Allowlisting
For maximum security, restrict access to specific IP addresses:
agent.yourdomain.com {
@blocked not remote_ip 203.0.113.0/24 198.51.100.50
respond @blocked "Access denied" 403
reverse_proxy localhost:5000
# ... rest of config
}Combining Methods
Use both authentication and IP restrictions:
agent.yourdomain.com {
@allowed remote_ip 203.0.113.0/24 198.51.100.50
handle @allowed {
basicauth {
admin $2a$14$your-hashed-password-here
}
reverse_proxy localhost:5000
}
handle {
respond "Access denied" 403
}
# ... headers and logging
}Docker Security Hardening
Agent Zero executes arbitrary code inside Docker containers. Harden this execution environment.
Resource Limits (Daemon-Level)
Prevent runaway containers from consuming all system resources:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 65536,
"Soft": 65536
},
"nproc": {
"Name": "nproc",
"Hard": 4096,
"Soft": 4096
}
},
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}sudo systemctl restart dockerContainer-Specific Limits
Configure resource limits for the Agent Zero execution container in .env:
# Docker Resource Limits
CODE_EXEC_DOCKER_CPU_LIMIT=2.0
CODE_EXEC_DOCKER_MEMORY_LIMIT=2g
CODE_EXEC_DOCKER_PIDS_LIMIT=256Network Isolation
For sensitive environments, restrict container network access:
# Create isolated network
docker network create --internal agent-zero-internal
# Configure in .env
CODE_EXEC_DOCKER_NETWORK=agent-zero-internalWith --internal, containers can communicate with each other but cannot reach the internet.
Read-Only Root Filesystem
For additional hardening:
CODE_EXEC_DOCKER_READ_ONLY=true
CODE_EXEC_DOCKER_TMPFS=/tmp:rw,size=512mSystem Resource Management
Memory Limits for Agent Zero
Add memory limits to the systemd service:
[Service]
MemoryMax=4G
MemoryHigh=3GSwap Configuration
Ensure adequate swap for memory-intensive operations:
# Check current swap
free -h
# Create 4GB swap file if needed
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make permanent
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstabProcess Limits
Prevent fork bombs and runaway processes:
agentzero soft nproc 256
agentzero hard nproc 512
agentzero soft nofile 65536
agentzero hard nofile 65536Monitoring with Netdata
Netdata provides real-time monitoring with minimal configuration:
curl https://get.netdata.cloud/kickstart.sh | bashAccess the dashboard at http://your-server-ip:19999. Configure Caddy to proxy it securely:
monitoring.yourdomain.com {
basicauth {
admin $2a$14$your-hashed-password-here
}
reverse_proxy localhost:19999
}What Netdata Monitors
- CPU, memory, disk, and network usage
- Docker container metrics
- Process-level resource consumption
- System health alerts
Logging & Health Checks
Structured Application Logging
{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"json": {
"class": "pythonjsonlogger.jsonlogger.JsonFormatter",
"format": "%(asctime)s %(name)s %(levelname)s %(message)s"
}
},
"handlers": {
"file": {
"class": "logging.handlers.RotatingFileHandler",
"filename": "/var/log/agent-zero/app.log",
"maxBytes": 104857600,
"backupCount": 5,
"formatter": "json"
}
},
"root": {
"level": "INFO",
"handlers": ["file"]
}
}sudo mkdir -p /var/log/agent-zero
sudo chown agentzero:agentzero /var/log/agent-zero
cd ~/agent-zero
source venv/bin/activate
pip install python-json-loggerLog Rotation
/var/log/agent-zero/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 640 agentzero agentzero
postrotate
systemctl reload agent-zero > /dev/null 2>&1 || true
endscript
}
/var/log/caddy/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 640 caddy caddy
postrotate
systemctl reload caddy > /dev/null 2>&1 || true
endscript
}Health Check Script
#!/bin/bash
ENDPOINT="http://localhost:5000"
WEBHOOK_URL="https://your-alerting-service/webhook" # Optional
response=$(curl -s -o /dev/null -w "%{http_code}" $ENDPOINT)
if [ "$response" != "200" ]; then
echo "$(date): Agent Zero health check failed with status $response" >> /var/log/agent-zero/healthcheck.log
# Attempt restart
sudo systemctl restart agent-zero
# Optional: Send alert
# curl -X POST -H "Content-Type: application/json" -d '{"text":"Agent Zero health check failed"}' $WEBHOOK_URL
fichmod +x ~/agent-zero/healthcheck.sh
# Add to crontab (run every 5 minutes)
crontab -e
# Add: */5 * * * * /home/agentzero/agent-zero/healthcheck.shBackup Strategy
Comprehensive Backup Script
#!/bin/bash
set -e
BACKUP_DIR="/home/agentzero/backups"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_NAME="agent-zero-full_$DATE"
mkdir -p $BACKUP_DIR
echo "Starting full backup: $BACKUP_NAME"
# Stop service for consistent backup
sudo systemctl stop agent-zero
# Backup memory, knowledge, and configuration
tar -czf $BACKUP_DIR/$BACKUP_NAME.tar.gz \
-C /home/agentzero/agent-zero \
memory/ \
knowledge/ \
.env \
prompts/ \
--exclude='*.pyc' \
--exclude='__pycache__'
# Restart service
sudo systemctl start agent-zero
# Calculate checksum
sha256sum $BACKUP_DIR/$BACKUP_NAME.tar.gz > $BACKUP_DIR/$BACKUP_NAME.sha256
# Cleanup old backups (keep 30 days)
find $BACKUP_DIR -name "agent-zero-full_*.tar.gz" -mtime +30 -delete
find $BACKUP_DIR -name "agent-zero-full_*.sha256" -mtime +30 -delete
echo "Backup completed: $BACKUP_NAME.tar.gz"
echo "Size: $(du -h $BACKUP_DIR/$BACKUP_NAME.tar.gz | cut -f1)"chmod +x ~/agent-zero/backup-full.sh
# Schedule daily backup at 3 AM
crontab -e
# Add: 0 3 * * * /home/agentzero/agent-zero/backup-full.sh >> /var/log/agent-zero/backup.log 2>&1Off-Site Backup
# Using rclone (supports S3, Backblaze, Google Cloud, etc.)
sudo apt install rclone -y
rclone config # Follow prompts to configure your remote
# Add to backup script
rclone copy $BACKUP_DIR/$BACKUP_NAME.tar.gz remote:agent-zero-backups/Update Procedures
Updating Agent Zero
cd ~/agent-zero
# Create backup first
./backup-full.sh
# Stop service
sudo systemctl stop agent-zero
# Pull latest code
git fetch origin
git checkout main
git pull origin main
# Update dependencies
source venv/bin/activate
pip install -r requirements.txt --upgrade
# Review changelog for breaking changes
cat CHANGELOG.md | head -100
# Start service
sudo systemctl start agent-zero
# Verify functionality
curl -I http://localhost:5000Updating Docker Images
# Pull latest execution image
docker pull frdel/agent-zero-exe:latest
# Remove old container (Agent Zero will recreate it)
docker stop agent-zero-exe
docker rm agent-zero-exe
# Restart Agent Zero to use new image
sudo systemctl restart agent-zeroUpdating Ollama Models
# Update Ollama itself
curl -fsSL https://ollama.com/install.sh | sh
# Update models
ollama pull qwen2.5:7b
ollama pull nomic-embed-text
# Verify
ollama listTroubleshooting Production Issues
High Memory Usage
# Check what's consuming memory
ps aux --sort=-%mem | head -20
# Check Docker containers
docker stats --no-stream
# If Ollama is the culprit, reduce loaded models
curl http://localhost:11434/api/generate -d '{"model": "qwen2.5:7b", "keep_alive": 0}'Slow Response Times
# Check system load
uptime
top -bn1 | head -20
# Check if swapping
vmstat 1 5
# Check Docker container health
docker logs agent-zero-exe --tail 100
# Check Ollama inference speed
time ollama run qwen2.5:7b "Hello" --verboseService Won't Start
# Check service status
sudo systemctl status agent-zero
# View recent logs
journalctl -u agent-zero -n 100 --no-pager
# Check for port conflicts
sudo ss -tlnp | grep 5000
# Verify permissions
ls -la ~/agent-zero/.env
ls -la ~/agent-zero/memory/Docker Container Issues
# List all containers including stopped
docker ps -a
# Check container logs
docker logs agent-zero-exe --tail 200
# Inspect container
docker inspect agent-zero-exe
# Force recreate
docker stop agent-zero-exe
docker rm agent-zero-exe
sudo systemctl restart agent-zeroProduction Checklist
Before going live, verify:
Series Complete!
Over this six-part series, you've built a production-ready Agent Zero deployment:
- Environment Setup — Secured Ubuntu server with hardened SSH, firewall, and fail2ban
- Core Deployment — Docker installation and Agent Zero framework setup
- LLM Configuration — Cloud providers and local Ollama models
- Memory & Knowledge — Persistent learning with private web search
- Multi-Agent Systems — Agent hierarchies and custom tools
- Production Hardening — Reverse proxy, authentication, monitoring, and operations
Your self-hosted AI agent infrastructure now runs independently of third-party services, with the privacy, control, and reliability that cloud AI platforms cannot provide.
