Part 7 of 7

Backups and Disaster Recovery with pgBackRest

The operational maturity checkpoint. Continuous archiving, encrypted offsite backups, and a written DR runbook.

60 minutes

+ S3-compatible bucket

Prerequisites

Working Postgres from Part 1; ideally Part 6's cluster

Time to Complete

55–80 minutes

External

S3-compatible bucket (B2/Wasabi/RamNode CO Storage)

Why pg_dump Is Not a Backup Strategy for Production

• No point-in-time recovery — you restore to the moment pg_dump ran, never to one minute before the bad UPDATE
• Restore is single-threaded and slow on a database larger than a few GB
• Long-running dumps hold transaction snapshots that prevent vacuum from cleaning up dead tuples
• Schema drift between dump and database is invisible until you try to restore

pg_dump is a fine logical export tool. It is not a backup.

The Continuous Archiving Model

Take a base backup of the data directory once. Stream WAL segments to the repository continuously after that. To restore to any point in time, lay down the base backup and replay WAL up to your chosen target. This is what pgBackRest, Barman, and WAL-G all implement; they differ in the operational ergonomics.

pgBackRest vs Barman vs WAL-G

All three work. pgBackRest gets the spotlight here because it has the strongest combination of features for a self-hosted Postgres on a VPS:

• Incremental block-level backups — only changed 16 KB blocks ship over the wire
• Built-in encryption (AES-256-CBC) and parallel compression (zstd/gzip/lz4)
• First-class S3, Azure Blob, and GCS support
• Easy parallel restore and PITR with a single pgbackrest restore

Install pgBackRest 2.x from PGDG

sudo apt install -y pgbackrest

Choosing the Repository Location

Two practical options on a RamNode deployment:

• A small dedicated RamNode VPS as a backup host — fast restore, predictable bandwidth, no per-GB egress fees within the same datacenter
• An S3-compatible bucket — Backblaze B2 ($6/TB/month) and Wasabi ($6.99/TB/month) are the cost-effective defaults; RamNode Cloud Object Storage works equally well

The 3-2-1 Rule for Self-Hosters

Three copies of the data, on two different media, with one offsite. For a one-VPS Postgres that means: production data + a local fast-restore repo on a second VPS in the same region + an encrypted S3-compatible offsite repo. pgBackRest supports two repositories simultaneously, so you can satisfy the rule from a single configuration.

pgbackrest.conf Walkthrough

/etc/pgbackrest/pgbackrest.conf

[global]
# Local-on-backup-host repo (fast restore):
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7

# Offsite encrypted S3-compatible repo:
repo2-type=s3
repo2-s3-endpoint=s3.us-east-005.backblazeb2.com
repo2-s3-bucket=acme-pg-backups
repo2-s3-region=us-east-005
repo2-s3-key=AKIA...
repo2-s3-key-secret=SECRET...
repo2-path=/postgres/pg-cluster
repo2-cipher-type=aes-256-cbc
repo2-cipher-pass=GENERATE-A-LONG-RANDOM-PASSPHRASE-HERE
repo2-retention-full=4
repo2-retention-diff=14

process-max=4
log-level-console=info
log-level-file=detail
compress-type=zst
compress-level=3

[pg-cluster]
pg1-path=/var/lib/postgresql/17/main
pg1-port=5432
pg1-user=postgres

archive_command Setup

postgresql.conf

archive_mode = on
archive_command = 'pgbackrest --stanza=pg-cluster archive-push %p'
max_wal_senders = 10
wal_level = replica

sudo systemctl restart postgresql@17-main

Create the Stanza, First Backup, Verify

sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info stanza-create
sudo -u postgres pgbackrest --stanza=pg-cluster --type=full --log-level-console=info backup
sudo -u postgres pgbackrest --stanza=pg-cluster check

Backup Types and a Sensible Schedule

Common rhythm:

• Full — weekly (Sunday 02:00). Slow, complete.
• Differential — daily (02:00 Mon–Sat). Changes since the last full.
• Incremental — hourly. Changes since the last backup of any type.

/etc/cron.d/pgbackrest

# m h dom mon dow user command
0  2 * * 0 postgres pgbackrest --stanza=pg-cluster --type=full backup
0  2 * * 1-6 postgres pgbackrest --stanza=pg-cluster --type=diff backup
0  *  * * * postgres pgbackrest --stanza=pg-cluster --type=incr backup

Retention Policies

pgBackRest retention is expressed in terms of full and differential backups. With repo1-retention-full=2, two full backups are kept, plus all dependent diffs and incrementals, plus the WAL needed to reach the oldest restore point. Add repo*-retention-archive if you want WAL pruning that diverges from backup retention.

Encryption

AES-256-CBC at rest. Generate a long random passphrase, store it in a password manager, and out-of-band — losing the passphrase means losing the offsite backup. Key rotation requires a fresh stanza on the new key; do not try to re-encrypt in place.

Compression

Default is gzip. zstd at level 3 gets you 30–50% better ratios at similar CPU cost — use it unless you are CPU-starved on the database host.

Verifying Restores in a Fire Drill

The only backup that works is the one you have tested. Spin up a throwaway VPS, install Postgres, install pgBackRest with the same config, and:

sudo systemctl stop postgresql@17-main
sudo -u postgres rm -rf /var/lib/postgresql/17/main/*
sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info restore
sudo systemctl start postgresql@17-main

Schedule this monthly. Document how long the restore takes — that is your real RTO.

Point-in-Time Recovery

sudo systemctl stop postgresql@17-main
sudo -u postgres pgbackrest --stanza=pg-cluster \
  --type=time --target='2026-05-13 14:32:00' \
  --target-action=promote restore
sudo systemctl start postgresql@17-main

The cluster replays WAL up to your target time and then promotes. Verify the data, then continue.

Restoring to a New Node (Dead Patroni Leader)

When a Patroni leader dies for good, Patroni promotes a replica automatically. Replacing the dead node is then "build a new VPS, restore from pgBackRest, let Patroni reattach it as a replica":

# On the new node, after installing Postgres and Patroni:
sudo -u postgres pgbackrest --stanza=pg-cluster --type=standby restore
sudo systemctl start patroni
patronictl -c /etc/patroni/patroni.yml list  # new node should appear as Replica

Single-Table Restores

pgBackRest does not restore individual tables — it restores the cluster. The fallbacks:

• Restore the cluster to a side instance using PITR, then pg_dump --table=schema.table from the side instance and reload into production
• For ongoing protection of a small set of tables, set up logical replication to a sidecar instance — then a "single-table restore" is just truncating production and copying from the sidecar

Monitoring Backup Success

pgbackrest --stanza=pg-cluster info

Wrap that in a check that posts to a webhook (Slack/ntfy) when the most recent backup is older than your tolerance:

/usr/local/bin/check-backup-age.sh

#!/bin/bash
LATEST=$(pgbackrest --stanza=pg-cluster info --output=json \
  | jq -r '.[0].backup[-1].timestamp.stop')
NOW=$(date +%s)
AGE=$((NOW - LATEST))
THRESHOLD=$((26 * 3600))   # 26 hours
if [ "$AGE" -gt "$THRESHOLD" ]; then
  curl -s -X POST -d "{\"text\":\":warning: pgBackRest backup is $((AGE/3600))h old\"}" \
    -H 'Content-Type: application/json' \
    https://hooks.slack.com/services/T/B/X
fi

Disaster Recovery Runbook Template

RPO target: 5 minutes (continuous WAL archiving).
RTO target: 30 minutes for most-recent backup; 60 minutes for PITR.
On-call roles: Incident Lead (decisions), Restore Operator (executes commands), Comms Lead (status updates).

Decision tree:

Is the primary alive?
  YES + corruption?     → PITR to before the corruption (this part)
  NO  + replica healthy?→ Patroni already failed over (Part 6); replace the dead node
  NO  + cluster lost?   → Provision new VPS, restore from pgBackRest, repoint apps

Verified commands: linked above in this part. Comms cadence: internal status every 15 min, external status every 30 min until resolved.

Series Wrap-Up

You now have an opinionated production PostgreSQL stack you can stand up on RamNode in an afternoon: tuned single-node, pooled, with pgvector + ParadeDB + pg_duckdb covering RAG, search, and analytics, sitting behind a Patroni HA cluster with verified offsite backups.

For where to go next, the series landing page lists companion content opportunities — logical replication and zero-downtime major upgrades, time-series partitioning with pg_partman, and horizontal sharding with Citus. Build something with what you have here first.

← Part 6: Patroni HA Back to Series