Part 3 of 7

    Docker in Production

    Monitoring, security, and maintenance practices that separate hobby deployments from production-ready systems.

    Monitoring
    Security
    Backups

    Getting containers running is one thing. Keeping them running reliably, securely, and efficiently on a production VPS is another challenge entirely. This guide covers the operational practices that separate a hobby deployment from a production-ready system.

    1

    Resource Monitoring

    On a VPS with limited resources, visibility into what your containers are consuming is essential. A runaway container can starve others or crash your server.

    Quick Monitoring with Docker Stats

    The built-in command provides real-time resource usage:

    Real-time stats
    docker stats
    Example output
    CONTAINER ID   NAME        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O
    a1b2c3d4e5f6   wordpress   0.50%     128MiB / 512MiB       25.00%    1.2MB / 800KB
    f6e5d4c3b2a1   mariadb     1.20%     256MiB / 512MiB       50.00%    800KB / 1.2MB

    Setting Resource Limits

    Prevent any single container from monopolizing your VPS:

    Resource limits in Compose
    services:
      app:
        image: your-app:latest
        deploy:
          resources:
            limits:
              cpus: '1.0'
              memory: 512M
            reservations:
              cpus: '0.25'
              memory: 128M
    • limits: Hard ceiling—container is throttled (CPU) or killed (memory) if exceeded
    • reservations: Guaranteed minimum resources

    Lightweight Monitoring with cAdvisor

    For a small VPS, a full Prometheus + Grafana stack might be overkill:

    cAdvisor setup
    services:
      cadvisor:
        image: gcr.io/cadvisor/cadvisor:latest
        ports:
          - "127.0.0.1:8080:8080"
        volumes:
          - /:/rootfs:ro
          - /var/run:/var/run:ro
          - /sys:/sys:ro
          - /var/lib/docker/:/var/lib/docker:ro
        restart: unless-stopped

    Access the web UI at http://localhost:8080 via SSH tunnel.

    Full Monitoring Stack

    For multiple services or when you need alerting:

    Prometheus + Grafana stack
    services:
      prometheus:
        image: prom/prometheus:latest
        ports:
          - "127.0.0.1:9090:9090"
        volumes:
          - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
          - prometheus-data:/prometheus
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=15d'
        restart: unless-stopped
    
      grafana:
        image: grafana/grafana:latest
        ports:
          - "127.0.0.1:3000:3000"
        volumes:
          - grafana-data:/var/lib/grafana
        environment:
          - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
          - GF_USERS_ALLOW_SIGN_UP=false
        restart: unless-stopped
    
      node-exporter:
        image: prom/node-exporter:latest
        ports:
          - "127.0.0.1:9100:9100"
        volumes:
          - /proc:/host/proc:ro
          - /sys:/host/sys:ro
          - /:/rootfs:ro
        command:
          - '--path.procfs=/host/proc'
          - '--path.sysfs=/host/sys'
          - '--path.rootfs=/rootfs'
        restart: unless-stopped
    
    volumes:
      prometheus-data:
      grafana-data:

    Create prometheus.yml:

    prometheus.yml
    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
    
      - job_name: 'node'
        static_configs:
          - targets: ['node-exporter:9100']
    
      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']

    This stack uses around 300-500MB RAM total.

    2

    Logging Strategies

    Container logs can quickly fill your disk if left unchecked. Docker's default logging driver keeps logs indefinitely.

    Configure Log Rotation

    Set global defaults in /etc/docker/daemon.json:

    /etc/docker/daemon.json
    {
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "10m",
        "max-file": "3"
      }
    }

    Restart Docker to apply:

    Apply changes
    sudo systemctl restart docker

    This limits each container to 30MB of logs (3 files × 10MB). Existing containers need to be recreated.

    Per-Container Log Configuration

    Override per container
    services:
      app:
        image: your-app:latest
        logging:
          driver: json-file
          options:
            max-size: "50m"
            max-file: "5"

    Centralized Logging with Loki

    For searching and analyzing logs across containers:

    Loki + Promtail
    services:
      loki:
        image: grafana/loki:latest
        ports:
          - "127.0.0.1:3100:3100"
        volumes:
          - loki-data:/loki
        restart: unless-stopped
    
      promtail:
        image: grafana/promtail:latest
        volumes:
          - /var/lib/docker/containers:/var/lib/docker/containers:ro
          - /var/run/docker.sock:/var/run/docker.sock
          - ./promtail.yml:/etc/promtail/promtail.yml:ro
        command: -config.file=/etc/promtail/promtail.yml
        restart: unless-stopped
    
    volumes:
      loki-data:
    3

    Automated Updates with Watchtower

    Keeping container images updated is critical for security patches. Watchtower automatically pulls new images and restarts containers.

    Basic Watchtower Setup

    Watchtower
    services:
      watchtower:
        image: containrrr/watchtower
        volumes:
          - /var/run/docker.sock:/var/run/docker.sock
        environment:
          - WATCHTOWER_CLEANUP=true
          - WATCHTOWER_SCHEDULE=0 0 4 * * *
        restart: unless-stopped

    This checks for updates daily at 4 AM and removes old images after updating.

    Selective Updates

    Update only specific containers by labeling them:

    Selective updates with labels
    services:
      watchtower:
        image: containrrr/watchtower
        volumes:
          - /var/run/docker.sock:/var/run/docker.sock
        environment:
          - WATCHTOWER_LABEL_ENABLE=true
          - WATCHTOWER_CLEANUP=true
          - WATCHTOWER_SCHEDULE=0 0 4 * * *
        restart: unless-stopped
    
      # This container will be auto-updated
      nginx:
        image: nginx:latest
        labels:
          - com.centurylinklabs.watchtower.enable=true
    
      # This container will NOT be auto-updated
      database:
        image: postgres:16
        labels:
          - com.centurylinklabs.watchtower.enable=false

    When Not to Use Automatic Updates

    Disable automatic updates for:

    • Databases: Schema changes in new versions can break things
    • Stateful applications: Where updates require migration steps
    • Production-critical services: Where you want to test updates first

    For these, pin specific versions:

    Pin versions
    services:
      database:
        image: postgres:16.1  # Pinned version, not :latest or :16
    4

    Backup Strategies

    Containers are ephemeral, but data isn't. A solid backup strategy covers both volumes and configuration.

    Direct Volume Backup

    Backup a volume
    # Stop the container first for consistency
    docker compose stop db
    
    # Create a backup
    docker run --rm \
      -v db-data:/source:ro \
      -v $(pwd)/backups:/backup \
      alpine tar czf /backup/db-backup-$(date +%Y%m%d).tar.gz -C /source .
    
    # Restart
    docker compose start db

    Database-Specific Dumps (Preferred)

    Database dumps
    # PostgreSQL
    docker compose exec db pg_dump -U postgres mydb > backup.sql
    
    # MySQL/MariaDB
    docker compose exec db mariadb-dump -u root -p mydb > backup.sql
    
    # MongoDB
    docker compose exec mongo mongodump --out /backup

    Database dumps are more portable and can be restored to different versions.

    Automated Backup Script

    Create /opt/docker-backup.sh:

    /opt/docker-backup.sh
    #!/bin/bash
    set -e
    
    BACKUP_DIR="/opt/backups"
    RETENTION_DAYS=7
    DATE=$(date +%Y%m%d_%H%M%S)
    
    mkdir -p "$BACKUP_DIR"
    
    # Backup Docker volumes
    echo "Backing up Docker volumes..."
    for volume in $(docker volume ls -q); do
        echo "  Backing up $volume"
        docker run --rm \
            -v "$volume":/source:ro \
            -v "$BACKUP_DIR":/backup \
            alpine tar czf "/backup/${volume}_${DATE}.tar.gz" -C /source .
    done
    
    # Backup Compose files
    echo "Backing up Compose configurations..."
    tar czf "$BACKUP_DIR/compose-configs_${DATE}.tar.gz" /opt/docker-apps/
    
    # Clean old backups
    echo "Removing backups older than $RETENTION_DAYS days..."
    find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
    
    echo "Backup complete!"

    Schedule with cron:

    Schedule backup
    chmod +x /opt/docker-backup.sh
    
    # Run daily at 2 AM
    echo "0 2 * * * root /opt/docker-backup.sh >> /var/log/docker-backup.log 2>&1" | sudo tee /etc/cron.d/docker-backup

    Off-Site Backups

    Sync to S3-compatible object storage:

    rclone sync
    # Install rclone
    curl https://rclone.org/install.sh | sudo bash
    
    # Configure (interactive)
    rclone config
    
    # Add to backup script
    rclone sync /opt/backups remote:docker-backups --transfers 4

    Restoring from Backup

    Restore commands
    # Restore a volume
    docker compose stop app
    docker run --rm \
      -v app-data:/target \
      -v $(pwd)/backups:/backup:ro \
      alpine sh -c "rm -rf /target/* && tar xzf /backup/app-data_20240115.tar.gz -C /target"
    docker compose start app
    
    # Restore a database dump
    docker compose exec -T db psql -U postgres mydb < backup.sql
    5

    Security Hardening

    Docker's defaults prioritize convenience over security. For production, tighten things up.

    Run Containers as Non-Root

    By default, processes inside containers run as root. If an attacker escapes the container, they have root on the host.

    Non-root user
    services:
      app:
        image: your-app:latest
        user: "1000:1000"  # Run as non-root user
        
      nginx:
        image: nginx:latest
        user: "nginx"
        
      postgres:
        image: postgres:16
        user: "postgres"

    Read-Only Filesystems

    Prevent attackers from modifying the container filesystem:

    Read-only container
    services:
      app:
        image: your-app:latest
        read_only: true
        tmpfs:
          - /tmp
          - /var/run
        volumes:
          - app-data:/data  # Writable volume for legitimate data

    The tmpfs mounts provide writable temporary directories in memory.

    Drop Unnecessary Capabilities

    Linux capabilities grant specific privileges. Drop all and add only what's needed:

    Drop capabilities
    services:
      app:
        image: your-app:latest
        cap_drop:
          - ALL
        cap_add:
          - NET_BIND_SERVICE  # Only if binding to ports < 1024

    Network Segmentation

    Isolate sensitive services:

    Network isolation
    services:
      nginx:
        networks:
          - frontend
    
      app:
        networks:
          - frontend
          - backend
    
      database:
        networks:
          - backend  # Not accessible from nginx
    
    networks:
      frontend:
      backend:
        internal: true  # No external access at all

    Protect the Docker Socket

    For services that need Docker access, use socket proxies with limited permissions:

    Docker socket proxy
    services:
      docker-proxy:
        image: tecnativa/docker-socket-proxy
        environment:
          - CONTAINERS=1
          - IMAGES=1
          - POST=0  # Read-only
        volumes:
          - /var/run/docker.sock:/var/run/docker.sock:ro
        networks:
          - docker-api
    
      app-needing-docker:
        environment:
          - DOCKER_HOST=tcp://docker-proxy:2375
        networks:
          - docker-api
    
    networks:
      docker-api:
        internal: true

    Scan Images for Vulnerabilities

    Trivy scan
    docker run --rm \
      -v /var/run/docker.sock:/var/run/docker.sock \
      aquasec/trivy image your-image:tag

    Security Checklist

    • Containers run as non-root where possible
    • Resource limits set on all containers
    • Log rotation configured
    • Unnecessary capabilities dropped
    • Sensitive services on internal networks
    • Docker socket only mounted when required
    • Images from trusted sources only
    • Regular backup schedule configured
    • Update strategy defined (automatic or manual)
    • Firewall rules limit exposed ports
    6

    Health Checks and Self-Healing

    Production containers should recover from transient failures automatically.

    Define Health Checks

    Health check configuration
    services:
      web:
        image: nginx:latest
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost/health"]
          interval: 30s
          timeout: 10s
          retries: 3
          start_period: 40s
    • test: Command to check health (exit 0 = healthy)
    • interval: Time between checks
    • timeout: Max time for check to complete
    • retries: Consecutive failures before unhealthy
    • start_period: Grace period for startup

    Common Health Check Patterns

    Health check patterns
    # HTTP endpoint
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
    
    # TCP port check
    healthcheck:
      test: ["CMD", "nc", "-z", "localhost", "5432"]
    
    # PostgreSQL
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
    
    # MySQL/MariaDB
    healthcheck:
      test: ["CMD", "healthcheck.sh", "--connect", "--innodb_initialized"]
    
    # Redis
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]

    Combine with Restart Policies

    Self-healing container
    services:
      app:
        restart: unless-stopped
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost/health"]
          interval: 30s
          retries: 3

    Docker restarts unhealthy containers automatically.

    7

    Production Compose File Template

    Here's a template incorporating all the production practices:

    Production docker-compose.yml
    services:
      app:
        image: your-app:latest
        restart: unless-stopped
        user: "1000:1000"
        read_only: true
        tmpfs:
          - /tmp
        cap_drop:
          - ALL
        deploy:
          resources:
            limits:
              cpus: '1.0'
              memory: 512M
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
          interval: 30s
          timeout: 10s
          retries: 3
        logging:
          driver: json-file
          options:
            max-size: "10m"
            max-file: "3"
        networks:
          - frontend
          - backend
        volumes:
          - app-data:/data
    
      database:
        image: postgres:16
        restart: unless-stopped
        user: "postgres"
        deploy:
          resources:
            limits:
              memory: 1G
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U postgres"]
          interval: 10s
          timeout: 5s
          retries: 5
        logging:
          driver: json-file
          options:
            max-size: "10m"
            max-file: "3"
        networks:
          - backend
        volumes:
          - db-data:/var/lib/postgresql/data
        environment:
          POSTGRES_PASSWORD_FILE: /run/secrets/db_password
        secrets:
          - db_password
    
    networks:
      frontend:
      backend:
        internal: true
    
    volumes:
      app-data:
      db-data:
    
    secrets:
      db_password:
        file: ./secrets/db_password.txt

    What's Next

    Your Docker deployments are now production-ready: monitored, logged, backed up, and secured. But Docker Compose has limits—it runs on a single host and lacks advanced orchestration features. In Part 4, we'll introduce Kubernetes concepts and explore when you might need to move beyond Compose to a full container orchestration platform.