Backups and Disaster Recovery with pgBackRest
The operational maturity checkpoint. Continuous archiving, encrypted offsite backups, and a written DR runbook.
Why pg_dump Is Not a Backup Strategy for Production
- • No point-in-time recovery — you restore to the moment
pg_dumpran, never to one minute before the badUPDATE - • Restore is single-threaded and slow on a database larger than a few GB
- • Long-running dumps hold transaction snapshots that prevent vacuum from cleaning up dead tuples
- • Schema drift between dump and database is invisible until you try to restore
pg_dump is a fine logical export tool. It is not a backup.
The Continuous Archiving Model
Take a base backup of the data directory once. Stream WAL segments to the repository continuously after that. To restore to any point in time, lay down the base backup and replay WAL up to your chosen target. This is what pgBackRest, Barman, and WAL-G all implement; they differ in the operational ergonomics.
pgBackRest vs Barman vs WAL-G
All three work. pgBackRest gets the spotlight here because it has the strongest combination of features for a self-hosted Postgres on a VPS:
- • Incremental block-level backups — only changed 16 KB blocks ship over the wire
- • Built-in encryption (AES-256-CBC) and parallel compression (zstd/gzip/lz4)
- • First-class S3, Azure Blob, and GCS support
- • Easy parallel restore and PITR with a single
pgbackrest restore
Install pgBackRest 2.x from PGDG
sudo apt install -y pgbackrestChoosing the Repository Location
Two practical options on a RamNode deployment:
- • A small dedicated RamNode VPS as a backup host — fast restore, predictable bandwidth, no per-GB egress fees within the same datacenter
- • An S3-compatible bucket — Backblaze B2 ($6/TB/month) and Wasabi ($6.99/TB/month) are the cost-effective defaults; RamNode Cloud Object Storage works equally well
The 3-2-1 Rule for Self-Hosters
Three copies of the data, on two different media, with one offsite. For a one-VPS Postgres that means: production data + a local fast-restore repo on a second VPS in the same region + an encrypted S3-compatible offsite repo. pgBackRest supports two repositories simultaneously, so you can satisfy the rule from a single configuration.
pgbackrest.conf Walkthrough
[global]
# Local-on-backup-host repo (fast restore):
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7
# Offsite encrypted S3-compatible repo:
repo2-type=s3
repo2-s3-endpoint=s3.us-east-005.backblazeb2.com
repo2-s3-bucket=acme-pg-backups
repo2-s3-region=us-east-005
repo2-s3-key=AKIA...
repo2-s3-key-secret=SECRET...
repo2-path=/postgres/pg-cluster
repo2-cipher-type=aes-256-cbc
repo2-cipher-pass=GENERATE-A-LONG-RANDOM-PASSPHRASE-HERE
repo2-retention-full=4
repo2-retention-diff=14
process-max=4
log-level-console=info
log-level-file=detail
compress-type=zst
compress-level=3
[pg-cluster]
pg1-path=/var/lib/postgresql/17/main
pg1-port=5432
pg1-user=postgresarchive_command Setup
archive_mode = on
archive_command = 'pgbackrest --stanza=pg-cluster archive-push %p'
max_wal_senders = 10
wal_level = replicasudo systemctl restart postgresql@17-mainCreate the Stanza, First Backup, Verify
sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info stanza-create
sudo -u postgres pgbackrest --stanza=pg-cluster --type=full --log-level-console=info backup
sudo -u postgres pgbackrest --stanza=pg-cluster checkBackup Types and a Sensible Schedule
Common rhythm:
- • Full — weekly (Sunday 02:00). Slow, complete.
- • Differential — daily (02:00 Mon–Sat). Changes since the last full.
- • Incremental — hourly. Changes since the last backup of any type.
# m h dom mon dow user command
0 2 * * 0 postgres pgbackrest --stanza=pg-cluster --type=full backup
0 2 * * 1-6 postgres pgbackrest --stanza=pg-cluster --type=diff backup
0 * * * * postgres pgbackrest --stanza=pg-cluster --type=incr backupRetention Policies
pgBackRest retention is expressed in terms of full and differential backups. With repo1-retention-full=2, two full backups are kept, plus all dependent diffs and incrementals, plus the WAL needed to reach the oldest restore point. Add repo*-retention-archive if you want WAL pruning that diverges from backup retention.
Encryption
AES-256-CBC at rest. Generate a long random passphrase, store it in a password manager, and out-of-band — losing the passphrase means losing the offsite backup. Key rotation requires a fresh stanza on the new key; do not try to re-encrypt in place.
Compression
Default is gzip. zstd at level 3 gets you 30–50% better ratios at similar CPU cost — use it unless you are CPU-starved on the database host.
Verifying Restores in a Fire Drill
The only backup that works is the one you have tested. Spin up a throwaway VPS, install Postgres, install pgBackRest with the same config, and:
sudo systemctl stop postgresql@17-main
sudo -u postgres rm -rf /var/lib/postgresql/17/main/*
sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info restore
sudo systemctl start postgresql@17-mainSchedule this monthly. Document how long the restore takes — that is your real RTO.
Point-in-Time Recovery
sudo systemctl stop postgresql@17-main
sudo -u postgres pgbackrest --stanza=pg-cluster \
--type=time --target='2026-05-13 14:32:00' \
--target-action=promote restore
sudo systemctl start postgresql@17-mainThe cluster replays WAL up to your target time and then promotes. Verify the data, then continue.
Restoring to a New Node (Dead Patroni Leader)
When a Patroni leader dies for good, Patroni promotes a replica automatically. Replacing the dead node is then "build a new VPS, restore from pgBackRest, let Patroni reattach it as a replica":
# On the new node, after installing Postgres and Patroni:
sudo -u postgres pgbackrest --stanza=pg-cluster --type=standby restore
sudo systemctl start patroni
patronictl -c /etc/patroni/patroni.yml list # new node should appear as ReplicaSingle-Table Restores
pgBackRest does not restore individual tables — it restores the cluster. The fallbacks:
- • Restore the cluster to a side instance using PITR, then
pg_dump --table=schema.tablefrom the side instance and reload into production - • For ongoing protection of a small set of tables, set up logical replication to a sidecar instance — then a "single-table restore" is just truncating production and copying from the sidecar
Monitoring Backup Success
pgbackrest --stanza=pg-cluster infoWrap that in a check that posts to a webhook (Slack/ntfy) when the most recent backup is older than your tolerance:
#!/bin/bash
LATEST=$(pgbackrest --stanza=pg-cluster info --output=json \
| jq -r '.[0].backup[-1].timestamp.stop')
NOW=$(date +%s)
AGE=$((NOW - LATEST))
THRESHOLD=$((26 * 3600)) # 26 hours
if [ "$AGE" -gt "$THRESHOLD" ]; then
curl -s -X POST -d "{\"text\":\":warning: pgBackRest backup is $((AGE/3600))h old\"}" \
-H 'Content-Type: application/json' \
https://hooks.slack.com/services/T/B/X
fiDisaster Recovery Runbook Template
RPO target: 5 minutes (continuous WAL archiving).
RTO target: 30 minutes for most-recent backup; 60 minutes for PITR.
On-call roles: Incident Lead (decisions), Restore Operator (executes commands), Comms Lead (status updates).
Decision tree:
Is the primary alive?
YES + corruption? → PITR to before the corruption (this part)
NO + replica healthy?→ Patroni already failed over (Part 6); replace the dead node
NO + cluster lost? → Provision new VPS, restore from pgBackRest, repoint appsVerified commands: linked above in this part. Comms cadence: internal status every 15 min, external status every 30 min until resolved.
Series Wrap-Up
You now have an opinionated production PostgreSQL stack you can stand up on RamNode in an afternoon: tuned single-node, pooled, with pgvector + ParadeDB + pg_duckdb covering RAG, search, and analytics, sitting behind a Patroni HA cluster with verified offsite backups.
For where to go next, the series landing page lists companion content opportunities — logical replication and zero-downtime major upgrades, time-series partitioning with pg_partman, and horizontal sharding with Citus. Build something with what you have here first.
