Architecture Overview
This guide deploys a single-server Nomad cluster suitable for development, staging, or small production workloads. The architecture consists of a Nomad server (scheduler and state store), a co-located Nomad client (task executor), Consul for service discovery, and Docker as the container runtime.
For high-availability production environments, this can be extended to a 3- or 5-server cluster across multiple RamNode VPS instances.
Prerequisites
Recommended VPS Specifications
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 1 vCPU | 2+ vCPU |
| RAM | 2 GB | 4 GB+ |
| Storage | 20 GB NVMe SSD | 40 GB+ NVMe SSD |
| OS | Ubuntu 24.04 LTS | Ubuntu 24.04 LTS |
| Network | 1 Gbps | 1 Gbps |
Before You Begin
- A RamNode VPS provisioned with Ubuntu 24.04 LTS
- Root or sudo SSH access to the VPS
- A domain name or static IP for accessing the Nomad UI (optional but recommended)
- Basic familiarity with Linux command-line administration
Initial Server Setup
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl gnupg software-properties-common \
apt-transport-https ca-certificates unzip jqConfigure Firewall
sudo ufw allow 22/tcp # SSH
sudo ufw allow 4646/tcp # Nomad HTTP API
sudo ufw allow 4647/tcp # Nomad RPC
sudo ufw allow 4648/tcp # Nomad Serf (WAN gossip)
sudo ufw allow 4648/udp # Nomad Serf (WAN gossip)
sudo ufw allow 8500/tcp # Consul HTTP API
sudo ufw allow 8301/tcp # Consul Serf LAN
sudo ufw allow 8301/udp # Consul Serf LAN
sudo ufw allow 8300/tcp # Consul Server RPC
sudo ufw --force enableSecurity Note: In production, restrict ports 4646, 4647, 4648, 8500, 8300, and 8301 to your private network or VPN CIDR range. Only SSH (22) and application ports should be publicly accessible.
sudo hostnamectl set-hostname nomad-server-01
sudo timedatectl set-timezone UTCCreate Nomad System User
sudo useradd --system --home /etc/nomad.d --shell /bin/false nomad
sudo mkdir -p /opt/nomad/data /etc/nomad.d
sudo chown -R nomad:nomad /opt/nomad /etc/nomad.dInstall Docker
Nomad uses Docker as its primary task driver for running containerized workloads.
# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add Docker repository
echo "deb [arch=$(dpkg --print-architecture) \
signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
# Verify the installation
sudo docker run hello-worldInstall Consul
Consul provides service discovery and health checking for Nomad-scheduled workloads. While optional, it is strongly recommended for production deployments.
# Add HashiCorp GPG key and repository
wget -O- https://apt.releases.hashicorp.com/gpg | \
sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y consulConfigure Consul
datacenter = "dc1"
data_dir = "/opt/consul"
log_level = "INFO"
server = true
bootstrap_expect = 1
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
ui_config {
enabled = true
}
connect {
enabled = true
}sudo systemctl enable consul
sudo systemctl start consul
consul members # Verify Consul is runningInstall Nomad
# HashiCorp repository already configured from Consul step
sudo apt install -y nomad
nomad version # Verify installationServer Configuration
datacenter = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
server {
enabled = true
bootstrap_expect = 1
}
client {
enabled = true
}
plugin "docker" {
config {
allow_privileged = false
volumes {
enabled = true
}
}
}
consul {
address = "127.0.0.1:8500"
}Create Systemd Service
[Unit]
Description=HashiCorp Nomad
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target consul.service
[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d/
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
TasksMax=infinity
OOMScoreAdjust=-1000
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable nomad
sudo systemctl start nomadVerify the Cluster
nomad server members
nomad node status
nomad status💡 Web UI Access: The Nomad UI is available at http://YOUR_VPS_IP:4646/ui. For secure remote access, set up an SSH tunnel: ssh -L 4646:localhost:4646 root@YOUR_VPS_IP
Enable ACL Security
Access Control Lists (ACLs) are essential for production Nomad deployments. They control who can submit jobs, view logs, and administer the cluster.
# Add ACL block to Nomad config
sudo tee -a /etc/nomad.d/nomad.hcl > /dev/null <<'EOF'
acl {
enabled = true
}
EOF
sudo systemctl restart nomad
sleep 5
# Bootstrap the ACL system (save the output!)
nomad acl bootstrapCritical: Save the Secret ID from the bootstrap output immediately. This is your management token and cannot be retrieved again. Store it in a secure location such as a password manager or HashiCorp Vault.
export NOMAD_TOKEN="<your-bootstrap-secret-id>"
# Optionally persist in your shell profile:
echo 'export NOMAD_TOKEN="<your-bootstrap-secret-id>"' >> ~/.bashrcDeploy Your First Job
job "hello-web" {
datacenters = ["dc1"]
type = "service"
group "web" {
count = 2
network {
port "http" {
to = 80
}
}
service {
name = "hello-web"
port = "http"
provider = "consul"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
task "nginx" {
driver = "docker"
config {
image = "nginx:alpine"
ports = ["http"]
}
resources {
cpu = 100
memory = 128
}
}
}
}# Preview the deployment plan
nomad job plan ~/hello-web.nomad.hcl
# Deploy the job
nomad job run ~/hello-web.nomad.hcl
# Check job status
nomad job status hello-web
# View allocation details
nomad alloc status <alloc-id>💡 Consul Integration: With Consul running, your Nginx instances are automatically registered as services. Check http://YOUR_VPS_IP:8500/ui to see them in the Consul service catalog with health checks.
Production Hardening
TLS Encryption
Enable TLS to encrypt all Nomad RPC and HTTP communication:
# Generate a CA and server certificates
nomad tls ca create
nomad tls cert create -server -region global
nomad tls cert create -client
# Move certificates to the config directory
sudo mkdir -p /etc/nomad.d/tls
sudo mv *.pem /etc/nomad.d/tls/
sudo chown -R nomad:nomad /etc/nomad.d/tlstls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-server-nomad.pem"
key_file = "/etc/nomad.d/tls/global-server-nomad-key.pem"
verify_server_hostname = true
verify_https_client = false
}Resource Limits
echo "nomad soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "nomad hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "nomad soft nproc 65536" | sudo tee -a /etc/security/limits.conf
echo "nomad hard nproc 65536" | sudo tee -a /etc/security/limits.confLog Rotation
sudo tee /etc/systemd/journald.conf.d/nomad.conf > /dev/null <<'EOF'
[Journal]
SystemMaxUse=500M
SystemMaxFileSize=50M
MaxRetentionSec=30day
EOF
sudo systemctl restart systemd-journaldMonitoring & Observability
Nomad exposes Prometheus-compatible metrics at /v1/metrics.
# Cluster health
nomad server members
nomad node status
# Job-level monitoring
nomad job status hello-web
nomad alloc logs <alloc-id>
# System metrics
curl -s http://localhost:4646/v1/metrics | \
jq '.Gauges[] | select(.Name | contains("nomad"))'Key Metrics to Watch
| Metric | Description | Alert Threshold |
|---|---|---|
| nomad.client.allocs.running | Active allocations per node | Varies by capacity |
| nomad.client.host.cpu.idle | Available CPU percentage | < 15% |
| nomad.client.host.memory.avail | Free memory in bytes | < 256 MB |
| nomad.client.host.disk.available | Available disk space | < 10% |
| nomad.nomad.rpc.query | RPC query rate | Sudden spikes |
Scaling Your Cluster
Adding Client Nodes
On each new RamNode VPS, install Nomad and Docker, then use this client-only configuration:
datacenter = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
client {
enabled = true
servers = ["<nomad_server_ip>:4647"]
}
plugin "docker" {
config {
allow_privileged = false
volumes { enabled = true }
}
}
consul {
address = "<consul_server_ip>:8500"
}Multi-Server HA Cluster
For high-availability, provision 3 or 5 RamNode VPS instances as Nomad servers with bootstrap_expect set to the total server count. Nomad uses Raft consensus and can tolerate (N-1)/2 server failures.
| Cluster Size | Fault Tolerance | RamNode Config |
|---|---|---|
| 1 server | None (dev/staging) | 1× Premium KVM 4GB ($24/mo) |
| 3 servers | 1 server failure | 3× Premium KVM 4GB ($72/mo) |
| 5 servers | 2 server failures | 5× Premium KVM 4GB ($120/mo) |
Troubleshooting
| Issue | Diagnosis | Solution |
|---|---|---|
| Nomad won't start | journalctl -u nomad -f | Check config syntax: nomad config validate /etc/nomad.d/ |
| No Docker driver | nomad node status -self | Ensure Docker is running: systemctl status docker |
| Job stuck in pending | nomad job status <job> | Check resource constraints and node eligibility |
| Consul unavailable | consul members | Verify Consul is running and ports are open |
| ACL bootstrap fails | Check if already bootstrapped | Reset with: nomad acl bootstrap -reset |
Useful Resources
| Resource | URL |
|---|---|
| Nomad Documentation | developer.hashicorp.com/nomad/docs |
| Consul Documentation | developer.hashicorp.com/consul/docs |
| Nomad Job Specification | developer.hashicorp.com/nomad/docs/job-specification |
| Nomad Tutorials | developer.hashicorp.com/nomad/tutorials |
Nomad Deployed Successfully!
Your RamNode VPS is now running HashiCorp Nomad with Docker container orchestration, Consul service discovery, ACL security, and TLS encryption. Scale horizontally by adding client nodes or promote to a multi-server HA configuration for fault tolerance.
