Observability Framework

    Deploy OpenTelemetry on RamNode VPS

    Full-stack observability with traces, metrics, and logs. Unified telemetry collection using the industry-standard open-source framework.

    Ubuntu 22.04/24.04 LTS
    Docker Compose
    ⏱️ 30-45 minutes

    What is OpenTelemetry?

    OpenTelemetry (OTel) is the industry-standard open-source observability framework for collecting, processing, and exporting telemetry data. It provides a unified approach to traces, metrics, and logs, eliminating the need for multiple vendor-specific agents.

    • Distributed Tracing: Track requests across services with Jaeger
    • Metrics Collection: Time-series data via Prometheus
    • Host Metrics: CPU, memory, disk, and network out of the box
    • Vendor Neutral: Single SDK for any backend
    • Grafana Dashboards: Unified visualization across all signals
    • Batch Processing: Efficient data pipeline with configurable limits

    This guide builds on top of other RamNode guides:

    1

    Prerequisites

    Recommended VPS Specifications

    ComponentMinimumRecommended
    CPU2 vCPUs4 vCPUs
    RAM2 GB4 GB
    Storage20 GB SSD40 GB+ SSD
    OSUbuntu 22.04 LTSUbuntu 24.04 LTS
    💡 Tip: A RamNode KVM VPS with 4 GB RAM is ideal for running the full observability stack alongside a small to medium application workload. For high-throughput environments, consider scaling to 8 GB. View Cloud VPS Plans →

    Software Requirements

    • • Docker Engine 24.0+ and Docker Compose v2 — see our Docker guide
    • curl and wget for downloading binaries
    2

    Initial Server Setup

    Update your system and install Docker:

    Update & Install Prerequisites
    sudo apt update && sudo apt upgrade -y
    sudo apt install -y ca-certificates curl gnupg lsb-release
    Install Docker
    curl -fsSL https://get.docker.com | sh
    sudo usermod -aG docker $USER
    newgrp docker
    Verify Installation
    docker --version
    docker compose version

    For a detailed Docker walkthrough, see our Docker Basics Guide.

    3

    Architecture Overview

    Component Breakdown

    ComponentRoleDefault Port
    OTel CollectorReceives, processes, and exports telemetry4317 (gRPC), 4318 (HTTP)
    PrometheusTime-series metrics storage and querying9090
    JaegerDistributed trace storage and UI16686 (UI), 14250
    GrafanaVisualization dashboards and alerting3000

    Data Flow

    Your application sends traces and metrics to the OpenTelemetry Collector via gRPC or HTTP. The Collector processes the data (batching, filtering, enriching) and exports it to the appropriate backends — metrics go to Prometheus, traces go to Jaeger. Grafana connects to each backend as a data source, providing unified dashboards across all three telemetry signals.

    4

    Deploy with Docker Compose

    Create Project Directory
    mkdir -p ~/otel-stack/{config,data/prometheus,data/grafana}
    cd ~/otel-stack
    docker-compose.yml
    version: '3.8'
    
    services:
      otel-collector:
        image: otel/opentelemetry-collector-contrib:latest
        container_name: otel-collector
        command: ['--config=/etc/otel/config.yaml']
        volumes:
          - ./config/otel-collector.yaml:/etc/otel/config.yaml:ro
        ports:
          - '4317:4317'     # OTLP gRPC
          - '4318:4318'     # OTLP HTTP
          - '8888:8888'     # Collector metrics
        restart: unless-stopped
    
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        volumes:
          - ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
          - ./data/prometheus:/prometheus
        ports:
          - '9090:9090'
        restart: unless-stopped
    
      jaeger:
        image: jaegertracing/all-in-one:latest
        container_name: jaeger
        environment:
          - COLLECTOR_OTLP_ENABLED=true
        ports:
          - '16686:16686'    # Jaeger UI
          - '14250:14250'    # gRPC
        restart: unless-stopped
    
      grafana:
        image: grafana/grafana:latest
        container_name: grafana
        volumes:
          - ./data/grafana:/var/lib/grafana
        ports:
          - '3000:3000'
        environment:
          - GF_SECURITY_ADMIN_PASSWORD=changeme
        restart: unless-stopped
    ⚠️ Important: Replace GF_SECURITY_ADMIN_PASSWORD with a strong, unique password before deploying to production.
    5

    Configure the OpenTelemetry Collector

    The Collector is the central hub. Its configuration defines receivers, processors, and exporters.

    config/otel-collector.yaml
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          memory: {}
          disk: {}
          network: {}
          load: {}
    
    processors:
      batch:
        send_batch_size: 1024
        timeout: 5s
      memory_limiter:
        check_interval: 5s
        limit_mib: 512
        spike_limit_mib: 128
      resourcedetection:
        detectors: [system]
        system:
          hostname_sources: [os]
    
    exporters:
      prometheus:
        endpoint: 0.0.0.0:8889
        namespace: otel
      otlp/jaeger:
        endpoint: jaeger:4317
        tls:
          insecure: true
      logging:
        loglevel: warn
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/jaeger, logging]
        metrics:
          receivers: [otlp, hostmetrics]
          processors: [memory_limiter, batch, resourcedetection]
          exporters: [prometheus]

    Key Configuration Decisions

    • Memory limiter: Prevents the Collector from consuming excessive RAM. On a 4 GB VPS, capping at 512 MiB with a 128 MiB spike limit provides a good balance.
    • Batch processor: Batching reduces network overhead and improves backend write performance. A batch size of 1024 with a 5-second timeout works well for most workloads.
    • Host metrics: The hostmetrics receiver provides system-level CPU, memory, disk, and network metrics without any application instrumentation.
    6

    Configure Prometheus

    Prometheus scrapes the metrics endpoint exposed by the Collector.

    config/prometheus.yml
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: 'otel-collector'
        static_configs:
          - targets: ['otel-collector:8889']
    
      - job_name: 'collector-internal'
        static_configs:
          - targets: ['otel-collector:8888']

    The first job collects application and host metrics (port 8889). The second scrapes the Collector's own internal metrics (port 8888) for monitoring pipeline health.

    7

    Launch the Stack

    Start All Services
    cd ~/otel-stack
    docker compose up -d
    Verify Running Containers
    docker compose ps
    Check Collector Logs
    docker compose logs otel-collector --tail 50

    Verify Endpoints

    ServiceURLExpected
    Grafanahttp://YOUR_IP:3000Login page
    Jaeger UIhttp://YOUR_IP:16686Search page
    Prometheushttp://YOUR_IP:9090Query interface
    Collector Healthhttp://YOUR_IP:8888/metricsMetrics output
    8

    Instrument a Sample Application

    To verify the full pipeline, instrument a simple Python Flask application. The same principles apply to any language supported by OpenTelemetry SDKs.

    Install OTel Python Packages
    pip install flask \
      opentelemetry-api \
      opentelemetry-sdk \
      opentelemetry-instrumentation-flask \
      opentelemetry-exporter-otlp
    app.py — Flask with OTel Instrumentation
    from flask import Flask
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanExporter
    from opentelemetry.exporter.otlp.proto.grpc.trace_exporter \
        import OTLPSpanExporter
    from opentelemetry.instrumentation.flask \
        import FlaskInstrumentor
    from opentelemetry.sdk.resources import Resource
    
    # Configure the tracer
    resource = Resource.create({'service.name': 'my-flask-app'})
    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(
        endpoint='http://localhost:4317',
        insecure=True
    )
    provider.add_span_processor(BatchSpanExporter(exporter))
    trace.set_tracer_provider(provider)
    
    app = Flask(__name__)
    FlaskInstrumentor().instrument_app(app)
    
    @app.route('/')
    def hello():
        return 'Hello from OpenTelemetry!'
    
    @app.route('/health')
    def health():
        return {'status': 'ok'}
    
    if __name__ == '__main__':
        app.run(host='0.0.0.0', port=5000)
    Test the Application
    python app.py &
    curl http://localhost:5000/
    curl http://localhost:5000/health

    After sending requests, open the Jaeger UI at http://YOUR_IP:16686 and select "my-flask-app" from the service dropdown. You should see traces for each HTTP request.

    9

    Set Up Grafana Dashboards

    Grafana provides the visualization layer. For a comprehensive walkthrough of Grafana setup, see our Grafana + Prometheus Guide.

    Adding Data Sources

    • Prometheus: Navigate to Configuration → Data Sources → Add data source. Select Prometheus and set the URL to http://prometheus:9090. Click Save & Test.
    • Jaeger: Add another data source, select Jaeger, and set the URL to http://jaeger:16686. This enables trace visualization within Grafana panels.

    Recommended Dashboards

    • Node Exporter Full (Dashboard ID 1860) for system-level metrics
    • OpenTelemetry Collector (Dashboard ID 15983) for pipeline health
    • • Custom application dashboards using PromQL queries against your OTel metrics

    To import a community dashboard, go to Dashboards → Import, enter the dashboard ID, select your Prometheus data source, and click Import.

    10

    Firewall & Security Hardening

    UFW Configuration
    # Allow SSH
    sudo ufw allow 22/tcp
    
    # Allow OTel Collector (restrict to app servers)
    sudo ufw allow from YOUR_APP_IP to any port 4317
    sudo ufw allow from YOUR_APP_IP to any port 4318
    
    # Allow Grafana (restrict to your IP)
    sudo ufw allow from YOUR_ADMIN_IP to any port 3000
    
    # Block public access to Prometheus and Jaeger
    # Access via Grafana data sources instead
    
    sudo ufw enable

    Additional Security Measures

    • • Place Grafana behind a reverse proxy (Nginx or Caddy) with TLS
    • • Enable Grafana authentication with OAuth or LDAP for team access
    • • Set resource limits on Docker containers to prevent runaway memory usage
    • • Rotate the Grafana admin password and store credentials in environment files excluded from version control
    • • Use Docker network isolation so Prometheus and Jaeger are only accessible within the Compose network
    11

    Resource Tuning for VPS Environments

    Memory Allocation Guidelines

    Component2 GB VPS4 GB VPS8 GB VPS
    OTel Collector256 MiB512 MiB1 GiB
    Prometheus512 MiB1 GiB2 GiB
    Jaeger256 MiB512 MiB1 GiB
    Grafana128 MiB256 MiB512 MiB

    Prometheus Retention Settings

    Adjust storage retention to match your available disk space. Add these flags to the Prometheus service:

    Prometheus Retention Flags
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=15d'
      - '--storage.tsdb.retention.size=5GB'

    Reducing Collector Overhead

    For low-traffic applications, increase the batch timeout and reduce scrape frequency:

    Low-Traffic Batch Config
    processors:
      batch:
        send_batch_size: 512
        timeout: 10s
    12

    Troubleshooting

    Common Issues

    Collector fails to start: Check the config YAML for syntax errors using docker compose logs otel-collector. The most common issue is incorrect indentation in pipeline definitions.
    No metrics in Prometheus: Verify that the scrape target is reachable at http://YOUR_IP:9090/targets. If the otel-collector target shows as down, confirm port 8889 is exposed.
    No traces in Jaeger: Confirm your application sends data to the correct Collector endpoint (port 4317 for gRPC or 4318 for HTTP). Check Collector logs for export errors.
    High memory usage: Lower the memory_limiter thresholds and add Docker memory limits using deploy.resources.limits.memory in your Compose file.
    Grafana cannot reach data sources: Use Docker service names (prometheus, jaeger) rather than localhost in Grafana data source URLs.

    Useful Diagnostic Commands

    Container Resource Usage
    docker stats --no-stream
    Test OTLP Endpoint Connectivity
    curl -v http://localhost:4318/v1/traces
    View Collector Internal Metrics
    curl http://localhost:8888/metrics | grep otel
    Restart a Single Service
    docker compose restart otel-collector

    Next Steps

    • • Add Loki for centralized log aggregation, completing the three pillars of observability
    • • Set up Grafana alerting rules for latency spikes, error rates, or resource exhaustion
    • • Instrument additional services and use trace context propagation across microservices
    • • Implement the OTel Collector's tail_sampling processor to reduce storage costs
    • • Explore Grafana Tempo as an alternative to Jaeger for Grafana-native tracing
    • • Automate the deployment with Ansible or Terraform for reproducible infrastructure