Auto-Scaling and Load Balancing - RamNode Cloud VPS Documentation

Scale your applications automatically to handle traffic spikes and distribute load across multiple instances for high availability.

Understanding Scaling

Vertical Scaling (Scale Up)

Add more resources to existing instance:

• More CPU cores
• More RAM
• Larger disk

Pro: Simple to implement

Con: Limited by instance size, requires downtime

Horizontal Scaling (Scale Out)

Add more instances:

• Multiple identical instances
• Load balancer distributes traffic
• Better redundancy

Pro: Nearly unlimited scaling, no downtime

Con: More complex architecture

Load Balancer Setup

Our cloud platform includes Cloud Load Balancers for distributing traffic across multiple instances.

Creating a Load Balancer

Log into the Cloud Control Panel
Navigate to Network → Load Balancers
Click Create Load Balancer
Configure:
- Name and description
- Region (must match your instances)
- Protocol (HTTP, HTTPS, TCP)
- Health check settings
Add instances to the pool
Click Create

Load Balancing Algorithms

Round Robin

Distributes evenly across all instances

Least Connections

Sends to instance with fewest connections

Source IP Hash

Same client to same instance

Manual Horizontal Scaling

While we don't have automatic auto-scaling built into the platform, you can implement manual horizontal scaling:

Step 1: Create a Golden Image

Set up one instance with your application fully configured
Create a snapshot of this instance
This becomes your template for new instances

Step 2: Launch Additional Instances

# Using OpenStack CLI\nopenstack server create \\\n  --image my-golden-image \\\n  --flavor m1.small \\\n  --network public \\\n  web-server-2

Step 3: Add to Load Balancer

Add the new instance to your load balancer pool through the Cloud Control Panel or API.

Implementing Auto-Scaling with Scripts

Create your own auto-scaling solution using monitoring and the OpenStack API:

#!/bin/bash
# Simple auto-scaling script

LB_POOL_ID="your-load-balancer-pool-id"
IMAGE_ID="your-golden-image-id"
FLAVOR_ID="m1.small"

# Get current CPU usage across instances
AVG_CPU=$(nova list --fields=id | grep -v ID | while read id rest; do
    nova show $id | grep "cpu_util" | awk '{print $4}'
done | awk '{sum+=$1; count++} END {print sum/count}')

# Scale up if average CPU > 80%
if [ $(echo "$AVG_CPU > 80" | bc) -eq 1 ]; then
    echo "High CPU detected, scaling up..."
    nova boot --image $IMAGE_ID --flavor $FLAVOR_ID web-server-$(date +%s)
fi

# Scale down if average CPU < 20% and more than 2 instances
INSTANCE_COUNT=$(nova list | grep -c ACTIVE)
if [ $(echo "$AVG_CPU < 20" | bc) -eq 1 ] && [ $INSTANCE_COUNT -gt 2 ]; then
    echo "Low CPU detected, scaling down..."
    OLDEST=$(nova list --sort created_at:asc | grep ACTIVE | head -1 | awk '{print $2}')
    nova delete $OLDEST
fi

Run as Cron Job

# Check every 5 minutes\n*/5 * * * * /usr/local/bin/autoscale.sh >> /var/log/autoscale.log 2>&1

Application-Level Considerations

For horizontal scaling to work effectively, your application needs to be designed properly:

Stateless Architecture

Your application should not store session data locally on the instance.

Solutions:

Use Redis or Memcached for session storage
Store sessions in a shared database
Use JWT tokens for stateless authentication
Enable sticky sessions on load balancer (less ideal)

Shared Storage

User-uploaded files and assets should be stored centrally:

Object storage (S3-compatible)
Shared NFS/GlusterFS volume
CDN for static assets

Database Considerations

Your database should be separate from your web servers:

Use a dedicated database instance
Consider read replicas for read-heavy workloads
Implement connection pooling
Use caching layers (Redis, Memcached)

Third-Party Auto-Scaling Tools

Several third-party tools can help implement auto-scaling:

Kubernetes

Deploy a Kubernetes cluster for container orchestration with built-in auto-scaling.

• Horizontal Pod Autoscaler
• Cluster Autoscaler
• Vertical Pod Autoscaler

Docker Swarm

Lighter alternative to Kubernetes with basic scaling capabilities.

• Service scaling
• Rolling updates
• Load balancing

Terraform

Infrastructure as Code for managing instance lifecycle.

• Declarative configuration
• State management
• OpenStack provider

Ansible

Automation and orchestration for scaling operations.

• Playbook automation
• Dynamic inventory
• OpenStack modules

Monitoring & Cost Management

Key Metrics to Monitor

CPU Usage

Scale up when consistently above 70-80%

Memory Usage

Watch for memory pressure and swapping

Request Rate

Requests per second across all instances

Response Time

Average and 95th percentile latency

Cost Considerations

▸Each additional instance adds to your hourly cost
▸Load balancers have their own pricing
▸Set maximum instance limits to control costs
▸Monitor spending in the Cloud Control Panel

Best Practices

▸Start Simple - Begin with 2-3 instances before complex auto-scaling
▸Test Failover - Regularly test that load balancer handles instance failures
▸Gradual Scaling - Add/remove instances one at a time
▸Cool-down Periods - Wait before scaling again (5-10 minutes)
▸Plan for Peak - Have capacity ready before expected traffic spikes
▸Monitor Costs - Track spending as you scale

Need help designing a scalable architecture? Contact our support team or check out our Professional Services for architecture consulting.

Auto-Scaling & Load Balancing