Back to Cloud VPS Documentation

    Disaster Recovery Planning

    Prepare for and recover from catastrophic failures

    Prepare for the unexpected with a solid disaster recovery plan. Learn how to protect your data and minimize downtime in case of failures or disasters.

    Key Recovery Metrics

    RTO

    Recovery Time Objective - How quickly must you recover?

    • Minutes: Mission-critical systems
    • Hours: Important business applications
    • Days: Non-critical systems

    RPO

    Recovery Point Objective - How much data can you afford to lose?

    • Zero: Continuous replication needed
    • Minutes: Very frequent backups
    • Hours/Days: Regular backups acceptable

    Disaster Scenarios to Plan For

    Hardware Failure

    Risk: Disk failure, server hardware problems

    Mitigation: Regular snapshots, RAID if using dedicated servers, move to redundant cloud infrastructure

    Human Error

    Risk: Accidental deletion, configuration mistakes

    Mitigation: Multiple backups with retention, test restorations, access controls

    Security Breach

    Risk: Ransomware, data theft, system compromise

    Mitigation: Offsite backups, immutable backups, security monitoring, incident response plan

    Datacenter Outage

    Risk: Power failure, network issues, natural disasters

    Mitigation: Multi-region deployments, offsite backups, documented failover procedures

    Backup Strategy

    Implement 3-2-1 Backup Rule

    • 3copies of data (1 production + 2 backups)
    • 2different storage types (cloud snapshot + external storage)
    • 1copy offsite/off-platform

    Automated Backup Schedule

    • Hourly: Critical databases with transaction logs
    • Daily: Production systems and important data
    • Weekly: Full system snapshots
    • Monthly: Long-term archival backups

    Multi-Region Strategy

    Primary/Secondary Setup

    1. Primary Region: Main production environment (e.g., New York)
    2. Secondary Region: Standby environment (e.g., Los Angeles)
    3. Data Replication: Regular snapshots copied to secondary region
    4. DNS Failover: Update DNS to point to secondary region if primary fails

    Snapshot Replication Script

    #!/bin/bash
    # Replicate snapshots to secondary region
    
    PRIMARY_REGION="nyc"
    SECONDARY_REGION="lax"
    INSTANCE_ID="your-instance-id"
    
    # Create snapshot in primary
    echo "Creating snapshot in $PRIMARY_REGION..."
    SNAPSHOT_ID=$(openstack server image create \
      --name "dr-backup-$(date +%Y%m%d-%H%M)" \
      $INSTANCE_ID -f value -c id)
    
    # Wait for snapshot to complete
    while [ "$(openstack image show $SNAPSHOT_ID -f value -c status)" != "active" ]; do
      sleep 30
    done
    
    # Download snapshot
    echo "Downloading snapshot..."
    openstack image save --file /tmp/snapshot.qcow2 $SNAPSHOT_ID
    
    # Upload to secondary region
    echo "Uploading to $SECONDARY_REGION..."
    openstack --os-region-name $SECONDARY_REGION image create \
      --file /tmp/snapshot.qcow2 \
      --disk-format qcow2 \
      --container-format bare \
      "dr-backup-$(date +%Y%m%d-%H%M)"
    
    # Cleanup
    rm /tmp/snapshot.qcow2
    
    echo "DR backup complete"

    Testing Your DR Plan

    Quarterly

    Full DR test with complete failover

    Monthly

    Restore test from backup

    Weekly

    Verify backups are running

    DR Test Checklist

    • □ Backups are completing successfully
    • □ Can restore from most recent backup
    • □ Restored data is usable and complete
    • □ Can launch instances in secondary region
    • □ DNS failover works as expected
    • □ Application functions in DR environment
    • □ RTO and RPO targets are met
    • □ Team knows their roles and responsibilities
    • □ Documentation is up to date

    Documentation Requirements

    Maintain these critical documents:

    DR Runbook

    Step-by-step recovery procedures

    Contact List

    Who to call during disaster

    System Architecture

    Diagrams of infrastructure

    Configuration Backup

    All server configurations

    Credential Vault

    Secure access to critical passwords

    Service Dependencies

    What depends on what

    Cost vs. Recovery Time

    Basic:Daily snapshots, weekly offsite - RTO: 4-8 hours, RPO: 24 hours
    Standard:Hourly snapshots, standby instances - RTO: 1-2 hours, RPO: 1 hour
    High Availability:Real-time replication, hot standbys - RTO: Minutes, RPO: Near-zero

    Remember

    The cost of downtime usually far exceeds the cost of proper DR planning. Calculate your potential losses from an hour, day, or week of downtime to justify DR investment.