Part 7 of 10

    Backup & Disaster Recovery Automation

    Database dumps, file synchronization, S3 integration, encryption, retention policies, and automated restore testing. Because a backup you can't restore isn't a backup.

    S3 Storage
    Encryption
    Restore Testing

    Everyone knows they should have backups. Few people actually test restoring them. Even fewer have automated the entire process.

    This guide uses Claude Code to generate complete backup pipelines: database dumps, file synchronization, S3-compatible storage integration, encryption, retention policies, and—critically—automated restore testing.

    1

    Prerequisites

    • Claude Code installed (see Part 1)
    • S3-compatible storage (RamNode Object Storage, Backblaze B2, Wasabi, or MinIO)
    • A server with data worth protecting
    2

    Backup Strategy Overview

    A solid backup strategy follows the 3-2-1 rule:

    • 3 copies of your data
    • 2 different storage types
    • 1 offsite location

    We'll implement this with:

    • Local snapshots — fast recovery
    • Remote S3 storage — disaster recovery
    • Automated verification — trust but verify
    3

    Database Backup Scripts

    Let's start with the most critical data—databases. Here's a prompt for a comprehensive backup script:

    "Create a comprehensive database backup script that handles PostgreSQL, MariaDB/MySQL, Redis, and MongoDB with parallel backups, zstd compression, age encryption, retention (7 daily, 4 weekly, 12 monthly), S3 upload, Discord notifications, logging, lock files, and checksums. Make it configurable via environment file."

    scripts/backup-databases.sh
    #!/bin/bash
    set -euo pipefail
    
    # =============================================================================
    # Database Backup Script
    # Supports: PostgreSQL, MySQL/MariaDB, Redis, MongoDB
    # =============================================================================
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    CONFIG_FILE="${CONFIG_FILE:-$SCRIPT_DIR/.backup.env}"
    LOCK_FILE="/var/run/db-backup.lock"
    LOG_FILE="/var/log/backups/database-backup.log"
    
    # Load configuration
    if [[ -f "$CONFIG_FILE" ]]; then
        source "$CONFIG_FILE"
    else
        echo "ERROR: Configuration file not found: $CONFIG_FILE"
        exit 1
    fi
    
    # Defaults
    BACKUP_DIR="${BACKUP_DIR:-/var/backups/databases}"
    S3_BUCKET="${S3_BUCKET:-}"
    RETENTION_DAILY="${RETENTION_DAILY:-7}"
    RETENTION_WEEKLY="${RETENTION_WEEKLY:-4}"
    RETENTION_MONTHLY="${RETENTION_MONTHLY:-12}"
    ENCRYPTION_KEY="${ENCRYPTION_KEY:-}"
    
    # Timestamp formats
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    DAY_OF_WEEK=$(date +%u)
    DAY_OF_MONTH=$(date +%d)
    
    log() {
        local level="$1"; shift
        echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE"
    }
    
    compress_encrypt() {
        local input="$1"
        local output="$2"
        
        if [[ -n "$ENCRYPTION_KEY" ]]; then
            zstd -T0 -19 "$input" -c | age -r "$ENCRYPTION_KEY" -o "$output"
        else
            zstd -T0 -19 "$input" -o "$output"
        fi
        
        # Generate checksum
        sha256sum "$output" > "$output.sha256"
        rm -f "$input"
    }
    
    backup_postgresql() {
        local db="$1"
        local output_dir="$2"
        local dump_file="$output_dir/postgresql_${db}_${TIMESTAMP}.sql"
        local compressed_file="${dump_file}.zst"
        [[ -n "$ENCRYPTION_KEY" ]] && compressed_file="${dump_file}.zst.age"
        
        log "INFO" "Backing up PostgreSQL database: $db"
        
        PGPASSWORD="$POSTGRES_PASSWORD" pg_dump \
            -h "${POSTGRES_HOST:-localhost}" \
            -p "${POSTGRES_PORT:-5432}" \
            -U "${POSTGRES_USER:-postgres}" \
            -Fc --no-owner --no-acl \
            "$db" > "$dump_file"
        
        compress_encrypt "$dump_file" "$compressed_file"
        
        local size=$(du -h "$compressed_file" | cut -f1)
        log "OK" "PostgreSQL $db backed up ($size)"
    }
    
    # Similar functions for MySQL, Redis, MongoDB...
    .backup.env.example
    # =============================================================================
    # Database Backup Configuration
    # =============================================================================
    
    # Backup storage
    BACKUP_DIR=/var/backups/databases
    
    # S3 Configuration
    S3_BUCKET=my-backups
    S3_ENDPOINT=https://s3.us-east-1.wasabisys.com
    AWS_ACCESS_KEY_ID=your-access-key
    AWS_SECRET_ACCESS_KEY=your-secret-key
    
    # Encryption (age public key, generate with: age-keygen)
    ENCRYPTION_KEY=age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    # Retention (number to keep)
    RETENTION_DAILY=7
    RETENTION_WEEKLY=4
    RETENTION_MONTHLY=12
    
    # Notifications
    DISCORD_WEBHOOK=https://discord.com/api/webhooks/xxx/yyy
    
    # PostgreSQL
    POSTGRES_HOST=localhost
    POSTGRES_PORT=5432
    POSTGRES_USER=postgres
    POSTGRES_PASSWORD=your-password
    POSTGRES_DATABASES="app_production app_analytics"
    
    # MySQL/MariaDB
    MYSQL_HOST=localhost
    MYSQL_PORT=3306
    MYSQL_USER=root
    MYSQL_PASSWORD=your-password
    MYSQL_DATABASES="wordpress nextcloud"
    
    # Redis
    BACKUP_REDIS=true
    REDIS_HOST=localhost
    REDIS_PORT=6379
    4

    File & Application Backup

    For application files, Docker volumes, and configurations—incremental backups with rsync:

    scripts/backup-files.sh
    #!/bin/bash
    set -euo pipefail
    
    # =============================================================================
    # File Backup Script
    # Incremental backups with rsync + full weekly snapshots
    # =============================================================================
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    CONFIG_FILE="${CONFIG_FILE:-$SCRIPT_DIR/.backup-files.env}"
    LOG_FILE="/var/log/backups/file-backup.log"
    
    source "$CONFIG_FILE"
    
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    DAY_OF_WEEK=$(date +%u)
    
    log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"; }
    
    backup_directory() {
        local name="$1"
        local source="$2"
        local dest_base="$BACKUP_DIR/$name"
        
        mkdir -p "$dest_base"/{current,snapshots}
        
        log "Backing up $name: $source"
        
        # Rsync to current (incremental)
        rsync -avz --delete \
            --exclude-from="$SCRIPT_DIR/exclude-patterns.txt" \
            ${BANDWIDTH_LIMIT:+--bwlimit=$BANDWIDTH_LIMIT} \
            --link-dest="$dest_base/current" \
            "$source/" "$dest_base/current/"
        
        # Weekly full snapshot
        if [[ "$DAY_OF_WEEK" == "7" ]]; then
            local snapshot="$dest_base/snapshots/$TIMESTAMP"
            cp -al "$dest_base/current" "$snapshot"
            log "Created weekly snapshot: $snapshot"
            
            # Compress for offsite
            tar -cf - -C "$dest_base/snapshots" "$(basename "$snapshot")" | \
                zstd -T0 -19 > "$snapshot.tar.zst"
            
            if [[ -n "$ENCRYPTION_KEY" ]]; then
                age -r "$ENCRYPTION_KEY" -o "$snapshot.tar.zst.age" "$snapshot.tar.zst"
                rm "$snapshot.tar.zst"
            fi
        fi
    }
    
    backup_docker_volume() {
        local volume="$1"
        local container="${2:-}"
        local dest="$BACKUP_DIR/docker-volumes"
        
        mkdir -p "$dest"
        
        # Stop container if specified
        if [[ -n "$container" ]]; then
            docker stop "$container" || true
        fi
        
        # Backup volume
        local backup_file="$dest/${volume}_${TIMESTAMP}.tar.zst"
        docker run --rm \
            -v "$volume":/source:ro \
            -v "$dest":/backup \
            alpine tar -cf - -C /source . | zstd -T0 -19 > "$backup_file"
        
        # Restart container
        if [[ -n "$container" ]]; then
            docker start "$container"
        fi
        
        # Encrypt if configured
        if [[ -n "$ENCRYPTION_KEY" ]]; then
            age -r "$ENCRYPTION_KEY" -o "$backup_file.age" "$backup_file"
            rm "$backup_file"
        fi
    }
    backup-sources.txt
    # Format: name:path
    configs:/etc
    nginx:/etc/nginx
    traefik:/opt/traefik
    scripts:/opt/scripts
    www:/var/www
    exclude-patterns.txt
    # Exclude patterns for rsync
    node_modules/
    .git/
    __pycache__/
    *.pyc
    .cache/
    .npm/
    vendor/
    *.log
    *.tmp
    5

    Automated Restore Testing

    Backups are worthless if you can't restore them. Here's automated verification:

    "Create an automated restore testing script that runs weekly in an isolated environment, tests PostgreSQL restore (create temp DB, restore, verify row counts), tests file restore (extract, verify checksums), tests Docker volume restore, reports results with timing, alerts on failure, and cleans up test environments."

    scripts/test-restore.sh
    #!/bin/bash
    set -euo pipefail
    
    # =============================================================================
    # Automated Backup Restore Testing
    # Verifies backups are actually restorable
    # =============================================================================
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    CONFIG_FILE="${CONFIG_FILE:-$SCRIPT_DIR/.backup.env}"
    LOG_FILE="/var/log/backups/restore-test.log"
    TEST_DIR="/tmp/restore-test-$"
    
    source "$CONFIG_FILE"
    
    RESULTS=()
    FAILURES=()
    
    log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"; }
    
    cleanup() {
        log "Cleaning up test environment..."
        rm -rf "$TEST_DIR"
        PGPASSWORD="$POSTGRES_PASSWORD" psql -h "$POSTGRES_HOST" -U "$POSTGRES_USER" \
            -c "DROP DATABASE IF EXISTS restore_test_db;" 2>/dev/null || true
        docker rm -f restore-test-container 2>/dev/null || true
    }
    trap cleanup EXIT
    
    test_postgresql_restore() {
        local backup_file="$1"
        local db_name="$2"
        
        log "Testing PostgreSQL restore: $db_name"
        local start_time=$(date +%s)
        
        # Decrypt if needed
        local restore_file="$TEST_DIR/pg_restore.dump"
        if [[ "$backup_file" == *.age ]]; then
            age -d -i "$AGE_KEY_FILE" -o "$TEST_DIR/pg_restore.dump.zst" "$backup_file"
            zstd -d "$TEST_DIR/pg_restore.dump.zst" -o "$restore_file"
        elif [[ "$backup_file" == *.zst ]]; then
            zstd -d "$backup_file" -o "$restore_file"
        fi
        
        # Create and restore test database
        PGPASSWORD="$POSTGRES_PASSWORD" psql -h "$POSTGRES_HOST" -U "$POSTGRES_USER" \
            -c "CREATE DATABASE restore_test_db;"
        
        PGPASSWORD="$POSTGRES_PASSWORD" pg_restore \
            -h "$POSTGRES_HOST" -U "$POSTGRES_USER" \
            -d restore_test_db --no-owner --no-acl \
            "$restore_file" 2>/dev/null || true
        
        # Verify table count
        local table_count=$(PGPASSWORD="$POSTGRES_PASSWORD" psql -h "$POSTGRES_HOST" -U "$POSTGRES_USER" \
            -d restore_test_db -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';")
        
        local end_time=$(date +%s)
        local duration=$((end_time - start_time))
        
        # Cleanup
        PGPASSWORD="$POSTGRES_PASSWORD" psql -h "$POSTGRES_HOST" -U "$POSTGRES_USER" \
            -c "DROP DATABASE restore_test_db;"
        
        if [[ "$table_count" -gt 0 ]]; then
            log "✓ PostgreSQL $db_name: ${table_count// /} tables (${duration}s)"
            RESULTS+=("PostgreSQL $db_name: OK")
        else
            log "✗ PostgreSQL $db_name: Restore failed"
            FAILURES+=("PostgreSQL $db_name")
        fi
    }
    6

    Disaster Recovery Runbook

    Generate documentation for when things go wrong:

    ScenarioRTORPO
    Complete server loss2-4 hours24 hours
    Database corruption30 min1 hour
    Ransomware attack4-8 hours24 hours
    Accidental deletion15 minVaries
    Failed deployment5 min0

    RTO = Recovery Time Objective (how long to restore) | RPO = Recovery Point Objective (max data loss)

    Complete Server Loss Recovery

    Step 1: Provision new server
    # Via Terraform (if using IaC)
    cd infrastructure/terraform/environments/production
    terraform apply -target=module.webserver
    
    # Or manually provision via RamNode dashboard
    # Select: Ubuntu 24.04, same region, similar specs
    Step 2: Download and restore backups
    # Download from S3
    mkdir -p /restore && cd /restore
    
    export AWS_ACCESS_KEY_ID="your-key"
    export AWS_SECRET_ACCESS_KEY="your-secret"
    S3_ENDPOINT="https://your-s3-endpoint"
    
    # List and download latest backups
    aws s3 ls s3://backups/daily/ --endpoint-url=$S3_ENDPOINT
    aws s3 cp s3://backups/daily/postgresql_production_YYYYMMDD.sql.zst.age . \
        --endpoint-url=$S3_ENDPOINT
    
    # Decrypt
    age -d -i /secure/backup-key.txt -o postgresql_production.sql.zst \
        postgresql_production_YYYYMMDD.sql.zst.age
    
    # Decompress and restore
    zstd -d postgresql_production.sql.zst
    sudo -u postgres createdb production
    pg_restore -d production postgresql_production.sql
    7

    Scheduling & Integration

    Set up automated scheduling with systemd timers:

    /etc/systemd/system/backup-databases.service
    [Unit]
    Description=Database Backup
    After=network-online.target
    
    [Service]
    Type=oneshot
    User=root
    ExecStart=/opt/scripts/backup-databases.sh
    Environment=CONFIG_FILE=/opt/scripts/.backup.env
    /etc/systemd/system/backup-databases.timer
    [Unit]
    Description=Daily Database Backup
    
    [Timer]
    OnCalendar=*-*-* 02:00:00
    Persistent=true
    RandomizedDelaySec=300
    
    [Install]
    WantedBy=timers.target
    Enable timers
    sudo systemctl daemon-reload
    sudo systemctl enable --now backup-databases.timer
    sudo systemctl enable --now backup-files.timer
    sudo systemctl enable --now test-restore.timer
    
    # Verify
    sudo systemctl list-timers
    8

    Tips for Reliable Backups

    • Test restores regularly. The script helps, but do a full manual restore quarterly.
    • Monitor backup jobs. Add Prometheus metrics for backup success/failure/duration.
    • Encrypt everything offsite. Use age or GPG—never store unencrypted backups remotely.
    • Version your backup scripts. Keep them in your infrastructure repo.
    • Document recovery procedures. The runbook saves hours during an actual incident.
    • Use separate credentials. Backup S3 credentials should be different from application credentials.

    Quick Reference: Backup Prompts

    NeedPrompt Pattern
    Database backup"Create backup script for [PostgreSQL/MySQL] with [encryption] and [retention]"
    File backup"Generate rsync-based backup for [directories] with incremental and full snapshots"
    S3 upload"Add S3 upload to backup script using [provider] with lifecycle rules"
    Restore test"Create automated restore verification for [backup type]"
    Runbook"Generate disaster recovery runbook for [scenario]"

    What's Next

    You now have automated, verified, recoverable backups. In Part 8, we'll cover Security Hardening & Compliance Scripts—generating firewall rules, audit configurations, and compliance checks with Claude Code.

    Coming Soon