Getting containers running is one thing. Keeping them running reliably, securely, and efficiently on a production VPS is another challenge entirely. This guide covers the operational practices that separate a hobby deployment from a production-ready system.
Resource Monitoring
On a VPS with limited resources, visibility into what your containers are consuming is essential. A runaway container can starve others or crash your server.
Quick Monitoring with Docker Stats
The built-in command provides real-time resource usage:
docker statsCONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O
a1b2c3d4e5f6 wordpress 0.50% 128MiB / 512MiB 25.00% 1.2MB / 800KB
f6e5d4c3b2a1 mariadb 1.20% 256MiB / 512MiB 50.00% 800KB / 1.2MBSetting Resource Limits
Prevent any single container from monopolizing your VPS:
services:
app:
image: your-app:latest
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.25'
memory: 128M- limits: Hard ceiling—container is throttled (CPU) or killed (memory) if exceeded
- reservations: Guaranteed minimum resources
Lightweight Monitoring with cAdvisor
For a small VPS, a full Prometheus + Grafana stack might be overkill:
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "127.0.0.1:8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
restart: unless-stoppedAccess the web UI at http://localhost:8080 via SSH tunnel.
Full Monitoring Stack
For multiple services or when you need alerting:
services:
prometheus:
image: prom/prometheus:latest
ports:
- "127.0.0.1:9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=15d'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- "127.0.0.1:3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
ports:
- "127.0.0.1:9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:Create prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']This stack uses around 300-500MB RAM total.
Logging Strategies
Container logs can quickly fill your disk if left unchecked. Docker's default logging driver keeps logs indefinitely.
Configure Log Rotation
Set global defaults in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}Restart Docker to apply:
sudo systemctl restart dockerThis limits each container to 30MB of logs (3 files × 10MB). Existing containers need to be recreated.
Per-Container Log Configuration
services:
app:
image: your-app:latest
logging:
driver: json-file
options:
max-size: "50m"
max-file: "5"Centralized Logging with Loki
For searching and analyzing logs across containers:
services:
loki:
image: grafana/loki:latest
ports:
- "127.0.0.1:3100:3100"
volumes:
- loki-data:/loki
restart: unless-stopped
promtail:
image: grafana/promtail:latest
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- ./promtail.yml:/etc/promtail/promtail.yml:ro
command: -config.file=/etc/promtail/promtail.yml
restart: unless-stopped
volumes:
loki-data:Automated Updates with Watchtower
Keeping container images updated is critical for security patches. Watchtower automatically pulls new images and restarts containers.
Basic Watchtower Setup
services:
watchtower:
image: containrrr/watchtower
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_SCHEDULE=0 0 4 * * *
restart: unless-stoppedThis checks for updates daily at 4 AM and removes old images after updating.
Selective Updates
Update only specific containers by labeling them:
services:
watchtower:
image: containrrr/watchtower
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- WATCHTOWER_LABEL_ENABLE=true
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_SCHEDULE=0 0 4 * * *
restart: unless-stopped
# This container will be auto-updated
nginx:
image: nginx:latest
labels:
- com.centurylinklabs.watchtower.enable=true
# This container will NOT be auto-updated
database:
image: postgres:16
labels:
- com.centurylinklabs.watchtower.enable=falseWhen Not to Use Automatic Updates
Disable automatic updates for:
- Databases: Schema changes in new versions can break things
- Stateful applications: Where updates require migration steps
- Production-critical services: Where you want to test updates first
For these, pin specific versions:
services:
database:
image: postgres:16.1 # Pinned version, not :latest or :16Backup Strategies
Containers are ephemeral, but data isn't. A solid backup strategy covers both volumes and configuration.
Direct Volume Backup
# Stop the container first for consistency
docker compose stop db
# Create a backup
docker run --rm \
-v db-data:/source:ro \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/db-backup-$(date +%Y%m%d).tar.gz -C /source .
# Restart
docker compose start dbDatabase-Specific Dumps (Preferred)
# PostgreSQL
docker compose exec db pg_dump -U postgres mydb > backup.sql
# MySQL/MariaDB
docker compose exec db mariadb-dump -u root -p mydb > backup.sql
# MongoDB
docker compose exec mongo mongodump --out /backupDatabase dumps are more portable and can be restored to different versions.
Automated Backup Script
Create /opt/docker-backup.sh:
#!/bin/bash
set -e
BACKUP_DIR="/opt/backups"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# Backup Docker volumes
echo "Backing up Docker volumes..."
for volume in $(docker volume ls -q); do
echo " Backing up $volume"
docker run --rm \
-v "$volume":/source:ro \
-v "$BACKUP_DIR":/backup \
alpine tar czf "/backup/${volume}_${DATE}.tar.gz" -C /source .
done
# Backup Compose files
echo "Backing up Compose configurations..."
tar czf "$BACKUP_DIR/compose-configs_${DATE}.tar.gz" /opt/docker-apps/
# Clean old backups
echo "Removing backups older than $RETENTION_DAYS days..."
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup complete!"Schedule with cron:
chmod +x /opt/docker-backup.sh
# Run daily at 2 AM
echo "0 2 * * * root /opt/docker-backup.sh >> /var/log/docker-backup.log 2>&1" | sudo tee /etc/cron.d/docker-backupOff-Site Backups
Sync to S3-compatible object storage:
# Install rclone
curl https://rclone.org/install.sh | sudo bash
# Configure (interactive)
rclone config
# Add to backup script
rclone sync /opt/backups remote:docker-backups --transfers 4Restoring from Backup
# Restore a volume
docker compose stop app
docker run --rm \
-v app-data:/target \
-v $(pwd)/backups:/backup:ro \
alpine sh -c "rm -rf /target/* && tar xzf /backup/app-data_20240115.tar.gz -C /target"
docker compose start app
# Restore a database dump
docker compose exec -T db psql -U postgres mydb < backup.sqlSecurity Hardening
Docker's defaults prioritize convenience over security. For production, tighten things up.
Run Containers as Non-Root
By default, processes inside containers run as root. If an attacker escapes the container, they have root on the host.
services:
app:
image: your-app:latest
user: "1000:1000" # Run as non-root user
nginx:
image: nginx:latest
user: "nginx"
postgres:
image: postgres:16
user: "postgres"Read-Only Filesystems
Prevent attackers from modifying the container filesystem:
services:
app:
image: your-app:latest
read_only: true
tmpfs:
- /tmp
- /var/run
volumes:
- app-data:/data # Writable volume for legitimate dataThe tmpfs mounts provide writable temporary directories in memory.
Drop Unnecessary Capabilities
Linux capabilities grant specific privileges. Drop all and add only what's needed:
services:
app:
image: your-app:latest
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024Network Segmentation
Isolate sensitive services:
services:
nginx:
networks:
- frontend
app:
networks:
- frontend
- backend
database:
networks:
- backend # Not accessible from nginx
networks:
frontend:
backend:
internal: true # No external access at allProtect the Docker Socket
For services that need Docker access, use socket proxies with limited permissions:
services:
docker-proxy:
image: tecnativa/docker-socket-proxy
environment:
- CONTAINERS=1
- IMAGES=1
- POST=0 # Read-only
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- docker-api
app-needing-docker:
environment:
- DOCKER_HOST=tcp://docker-proxy:2375
networks:
- docker-api
networks:
docker-api:
internal: trueScan Images for Vulnerabilities
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image your-image:tagSecurity Checklist
- Containers run as non-root where possible
- Resource limits set on all containers
- Log rotation configured
- Unnecessary capabilities dropped
- Sensitive services on internal networks
- Docker socket only mounted when required
- Images from trusted sources only
- Regular backup schedule configured
- Update strategy defined (automatic or manual)
- Firewall rules limit exposed ports
Health Checks and Self-Healing
Production containers should recover from transient failures automatically.
Define Health Checks
services:
web:
image: nginx:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s- test: Command to check health (exit 0 = healthy)
- interval: Time between checks
- timeout: Max time for check to complete
- retries: Consecutive failures before unhealthy
- start_period: Grace period for startup
Common Health Check Patterns
# HTTP endpoint
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
# TCP port check
healthcheck:
test: ["CMD", "nc", "-z", "localhost", "5432"]
# PostgreSQL
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
# MySQL/MariaDB
healthcheck:
test: ["CMD", "healthcheck.sh", "--connect", "--innodb_initialized"]
# Redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]Combine with Restart Policies
services:
app:
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
retries: 3Docker restarts unhealthy containers automatically.
Production Compose File Template
Here's a template incorporating all the production practices:
services:
app:
image: your-app:latest
restart: unless-stopped
user: "1000:1000"
read_only: true
tmpfs:
- /tmp
cap_drop:
- ALL
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
networks:
- frontend
- backend
volumes:
- app-data:/data
database:
image: postgres:16
restart: unless-stopped
user: "postgres"
deploy:
resources:
limits:
memory: 1G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
networks:
- backend
volumes:
- db-data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
networks:
frontend:
backend:
internal: true
volumes:
app-data:
db-data:
secrets:
db_password:
file: ./secrets/db_password.txtWhat's Next
Your Docker deployments are now production-ready: monitored, logged, backed up, and secured. But Docker Compose has limits—it runs on a single host and lacks advanced orchestration features. In Part 4, we'll introduce Kubernetes concepts and explore when you might need to move beyond Compose to a full container orchestration platform.
