Prerequisites
Airflow workloads vary based on the number of DAGs, task concurrency, and worker load. Choose your RamNode plan accordingly.
| Use Case | vCPUs | RAM | Storage |
|---|---|---|---|
| Development / Testing | 2 | 4 GB | 40 GB NVMe |
| Small Production | 4 | 8 GB | 80 GB NVMe |
| Medium Production | 6 | 16 GB | 160 GB NVMe |
| Heavy Workloads | 8+ | 32 GB+ | 320 GB+ NVMe |
Software Requirements
- Ubuntu 24.04 LTS (fresh installation recommended)
- Python 3.10 or later (ships with Ubuntu 24.04)
- A registered domain name with DNS A record pointing to your VPS IP
Initial Server Setup
ssh your-user@your-vps-ip
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential python3-dev python3-pip \
python3-venv libpq-dev libffi-dev git curl wget unzipConfigure Firewall
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw statusCreate Airflow System User
sudo useradd -m -s /bin/bash airflow
sudo mkdir -p /opt/airflow
sudo chown airflow:airflow /opt/airflowInstall and Configure PostgreSQL
Airflow requires a metadata database to track DAG runs, task instances, and execution state. PostgreSQL is the recommended production backend.
sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable postgresql
sudo systemctl start postgresqlsudo -u postgres psql <<EOF
CREATE USER airflow_user WITH PASSWORD 'your_secure_password';
CREATE DATABASE airflow_db OWNER airflow_user;
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
\q
EOF🔐 Security Note
Replace 'your_secure_password' with a strong, randomly generated password. Use openssl rand -base64 32 to generate one. Never commit credentials to version control.
Install and Configure Redis
Redis serves as the message broker for CeleryExecutor, queuing tasks for distributed execution across workers.
sudo apt install -y redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server
# Verify Redis is running
redis-cli ping
# Expected output: PONGSecure Redis
bind 127.0.0.1 ::1
requirepass your_redis_password
maxmemory 256mb
maxmemory-policy allkeys-lrusudo systemctl restart redis-serverInstall Apache Airflow
Set Up Python Virtual Environment
sudo -u airflow bash
cd /opt/airflow
python3 -m venv venv
source venv/bin/activate
export AIRFLOW_VERSION=2.10.4
export PYTHON_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install --upgrade pip setuptools wheel
pip install "apache-airflow[celery,postgres,redis]==${AIRFLOW_VERSION}" \
--constraint "${CONSTRAINT_URL}"Configure Airflow
export AIRFLOW_HOME=/opt/airflow
echo 'export AIRFLOW_HOME=/opt/airflow' >> ~/.bashrc
# Initialize the database schema and generate airflow.cfg
airflow db migrateEdit airflow.cfg
# [core]
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow_user:your_secure_password@localhost/airflow_db
load_examples = False
default_timezone = UTC
parallelism = 32
max_active_runs_per_dag = 4
dag_dir_list_interval = 60
# [celery]
broker_url = redis://:your_redis_password@localhost:6379/0
result_backend = db+postgresql://airflow_user:your_secure_password@localhost/airflow_db
# [webserver]
expose_config = False# Re-run migration with PostgreSQL backend
airflow db migrate
# Create admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@yourdomain.com \
--password your_admin_passwordConfigure Systemd Services
Create systemd unit files to manage each Airflow component as a background service with automatic restart.
[Unit]
Description=Apache Airflow Webserver
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service
[Service]
User=airflow
Group=airflow
Type=simple
Environment=AIRFLOW_HOME=/opt/airflow
ExecStart=/opt/airflow/venv/bin/airflow webserver --port 8080
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target[Unit]
Description=Apache Airflow Scheduler
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service
[Service]
User=airflow
Group=airflow
Type=simple
Environment=AIRFLOW_HOME=/opt/airflow
ExecStart=/opt/airflow/venv/bin/airflow scheduler
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target[Unit]
Description=Apache Airflow Celery Worker
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service
[Service]
User=airflow
Group=airflow
Type=simple
Environment=AIRFLOW_HOME=/opt/airflow
ExecStart=/opt/airflow/venv/bin/airflow celery worker
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target[Unit]
Description=Apache Airflow Celery Flower
After=network.target redis-server.service
[Service]
User=airflow
Group=airflow
Type=simple
Environment=AIRFLOW_HOME=/opt/airflow
ExecStart=/opt/airflow/venv/bin/airflow celery flower --port 5555
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable airflow-webserver airflow-scheduler airflow-worker
sudo systemctl start airflow-webserver airflow-scheduler airflow-worker
# Optional: enable Flower monitoring
sudo systemctl enable airflow-flower
sudo systemctl start airflow-flower
# Verify all services are running
sudo systemctl status airflow-webserver airflow-scheduler airflow-workerNginx Reverse Proxy with SSL
sudo apt install -y nginx certbot python3-certbot-nginx
sudo systemctl enable nginxserver {
listen 80;
server_name airflow.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 90;
}
}sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Obtain SSL certificate
sudo certbot --nginx -d airflow.yourdomain.comUpdate Airflow for HTTPS
[webserver]
base_url = https://airflow.yourdomain.com
web_server_ssl_cert =
web_server_ssl_key =
enable_proxy_fix = Truesudo systemctl restart airflow-webserverDeploy Your First DAG
DAGs (Directed Acyclic Graphs) are Python files that define your workflow logic. Place them in the dags directory.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'hello_ramnode',
default_args=default_args,
description='A simple introductory DAG',
schedule_interval='@daily',
start_date=datetime(2026, 2, 1),
catchup=False,
tags=['example', 'ramnode'],
) as dag:
check_disk = BashOperator(
task_id='check_disk_space',
bash_command='df -h / | tail -1',
)
check_memory = BashOperator(
task_id='check_memory',
bash_command='free -m | grep Mem',
)
log_status = BashOperator(
task_id='log_status',
bash_command='echo "Health check completed at $(date)"',
)
[check_disk, check_memory] >> log_statusThe DAG will appear in the Airflow web UI within 60 seconds (based on dag_dir_list_interval). Enable it from the UI toggle and trigger a manual run to verify.
Production Hardening
Security Configuration
# Generate Fernet key for encryption
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Add to airflow.cfg [core]: fernet_key = your_generated_key_here
# Generate webserver session secret
python3 -c "import secrets; print(secrets.token_hex(32))"
# Add to airflow.cfg [webserver]: secret_key = your_secret_key_here[api]
auth_backends = airflow.api.auth.backend.session
[webserver]
expose_config = False
warn_deployment_exposure = TrueLog Rotation
/opt/airflow/logs/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
copytruncate
}Resource Tuning
| Setting | 4 GB VPS | 16 GB VPS |
|---|---|---|
| parallelism | 8 | 32 |
| max_active_tasks_per_dag | 4 | 16 |
| max_active_runs_per_dag | 2 | 8 |
| worker_concurrency | 4 | 16 |
| dag_dir_list_interval | 120 | 30 |
| min_file_process_interval | 60 | 30 |
Monitoring & Health Checks
curl -s https://airflow.yourdomain.com/health | python3 -m json.toolAutomated Health Check Script
#!/bin/bash
HEALTH=$(curl -sf http://localhost:8080/health)
SCHEDULER=$(echo $HEALTH | python3 -c "import sys,json; print(json.load(sys.stdin)['scheduler']['status'])")
if [ "$SCHEDULER" != "healthy" ]; then
echo "[ALERT] Airflow scheduler unhealthy at $(date)" | \
mail -s "Airflow Health Alert" admin@yourdomain.com
sudo systemctl restart airflow-scheduler
fichmod +x /opt/airflow/scripts/health_check.sh
# Add to crontab (runs every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/airflow/scripts/health_check.sh") | crontab -Backup Strategy
#!/bin/bash
BACKUP_DIR=/opt/airflow/backups
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup PostgreSQL database
pg_dump -U airflow_user airflow_db | gzip > \
$BACKUP_DIR/airflow_db_$TIMESTAMP.sql.gz
# Backup DAGs and configuration
tar czf $BACKUP_DIR/airflow_config_$TIMESTAMP.tar.gz \
/opt/airflow/dags \
/opt/airflow/airflow.cfg \
/opt/airflow/plugins
# Retain only last 7 days of backups
find $BACKUP_DIR -name '*.gz' -mtime +7 -delete
echo "Backup completed: $TIMESTAMP"chmod +x /opt/airflow/scripts/backup.sh
# Schedule daily backups at 2 AM
(crontab -l 2>/dev/null; echo "0 2 * * * /opt/airflow/scripts/backup.sh") | crontab -Upgrading Airflow
# 1. Stop all Airflow services
sudo systemctl stop airflow-webserver airflow-scheduler airflow-worker airflow-flower
# 2. Create a full backup using the backup script
/opt/airflow/scripts/backup.sh
# 3. Upgrade the Airflow package
source /opt/airflow/venv/bin/activate
export AIRFLOW_VERSION=2.x.x # Target version
pip install "apache-airflow[celery,postgres,redis]==${AIRFLOW_VERSION}" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# 4. Run the database migration
airflow db migrate
# 5. Restart all services
sudo systemctl start airflow-webserver airflow-scheduler airflow-worker airflow-flowerTroubleshooting
Webserver won't start
Check logs with journalctl -u airflow-webserver -n 50. Verify PostgreSQL is running and the connection string in airflow.cfg is correct.
DAGs not appearing
Verify DAG files have no syntax errors: python3 /opt/airflow/dags/your_dag.py. Check the dag_dir_list_interval setting.
Tasks stuck in queued
Ensure the Celery worker is running: systemctl status airflow-worker. Check Redis connectivity with redis-cli ping.
High memory usage
Reduce parallelism and worker_concurrency. Increase dag_dir_list_interval to reduce parsing frequency.
502 Bad Gateway in Nginx
Verify the webserver is running on port 8080. Check with curl http://localhost:8080/health.
Permission denied errors
Ensure all files in /opt/airflow are owned by the airflow user: sudo chown -R airflow:airflow /opt/airflow.
Apache Airflow Deployed Successfully!
Your Apache Airflow instance is now running in production on a RamNode VPS with CeleryExecutor, PostgreSQL metadata backend, Redis message broker, Nginx reverse proxy, and SSL encryption. RamNode's NVMe-backed VPS infrastructure provides the I/O performance and dedicated resources that workflow orchestration demands — starting at just $4/month.
