Back to Cloud VPS Documentation
    Server Administration

    Diagnosing High CPU Usage

    A comprehensive guide to identifying, understanding, and resolving CPU performance issues on Linux servers.

    Applies to: All RamNode VPS Plans | Ubuntu / Debian / CentOS / AlmaLinux | Rev. 2025

    High CPU usage is one of the most common performance complaints on a VPS. The root cause can range from a legitimate spike in traffic to a runaway process, a poorly optimized script, or a compromised server silently mining cryptocurrency. This guide walks through a structured diagnostic process: how to spot the problem, understand what the numbers mean, and decide on the appropriate response.

    1. Using top and htop to Identify Offending Processes

    top — The Universal Starting Point

    The top command is available on every Linux system without installation. Launch it with:

    Launch top
    top

    Key columns to focus on:

    ColumnWhat It Tells You
    %CPUCPU percentage consumed by this process. Can exceed 100% on multi-core systems (200% = 2 full cores)
    %MEMPercentage of physical RAM in use
    PIDProcess ID — required for kill commands and deeper investigation
    USERThe account running the process. www-data or nobody often indicates a web process; root may indicate a system task
    COMMANDThe executable name. Use e field or press c to toggle the full command path
    TIME+Cumulative CPU time consumed since the process started — useful for spotting runaway processes

    Useful top shortcuts: Press P to sort by CPU. Press M to sort by memory. Press k to kill a process by PID. Press 1 to expand per-core CPU view.

    htop — An Interactive Alternative

    htop provides a more readable, color-coded interface with mouse support. Install it if not already present:

    Install htop — Debian/Ubuntu
    apt install htop -y
    Install htop — CentOS/AlmaLinux
    dnf install htop -y
    Launch htop
    htop

    htop advantages over top:

    • Horizontal bars at the top show per-core utilization at a glance — immediately reveals whether one core is pegged vs. all cores under load
    • Scroll through the process list and kill processes without memorizing keyboard shortcuts
    • F4 (Filter) narrows the list to a specific process name, e.g., php-fpm or python3
    • Tree view (F5) shows parent/child relationships, revealing which master process spawned multiple workers

    TIP: On a fresh VPS you may not have htop installed. top is always available and is sufficient for initial triage.

    2. Understanding Load Average vs. CPU Count

    What Load Average Actually Means

    The three load average numbers shown in top and /proc/loadavg represent the average number of runnable (or uninterruptible) tasks over the past 1 minute, 5 minutes, and 15 minutes respectively. A common misconception is that these numbers directly map to CPU percentage — they do not.

    View load average
    uptime
    # Output: 14:32:11 up 22 days, load average: 2.41, 1.87, 1.55
    
    cat /proc/loadavg
    # Output: 2.41 1.87 1.55 3/312 18842

    Interpreting Load Relative to CPU Count

    The key formula: a load average equal to the number of logical CPUs represents 100% utilization. A load higher than your CPU count means processes are waiting for CPU time.

    Check CPU count
    nproc
    # Or get more detail
    lscpu | grep '^CPU(s):'
    ScenarioInterpretation
    Load = CPU countSystem is fully utilized — acceptable if short-lived
    Load < CPU countHeadroom exists — CPU is not the bottleneck
    Load 2× CPU countSignificant pressure — processes queuing for CPU time
    Load 4×+ CPU countSevere overload — expect sluggish SSH, slow response times

    WARNING: A 1-VPS plan with 1 vCPU and a load average of 1.0 is sitting at 100% utilization. A load of 2.0 means half the tasks are waiting. This is why high load on small VPS plans causes noticeable degradation faster than on dedicated servers.

    Reading the Trend

    Compare all three numbers together. A load of 8.0 / 4.0 / 2.0 on a 4-core system is decreasing — the spike may be passing. A load of 1.5 / 3.0 / 4.5 is increasing — something is accumulating and requires immediate attention.

    3. Distinguishing User, System, and I/O Wait

    The CPU Breakdown in top

    The summary line beginning with %Cpu(s) in top breaks CPU usage into several categories. Press 1 to expand to per-core view:

    Example CPU breakdown
    %Cpu(s):  45.2 us,  8.1 sy,  0.0 ni, 38.5 id,  7.9 wa,  0.0 hi,  0.2 si,  0.0 st
    FieldWhat It Means
    us (user)CPU time spent running user-space code. High values point to application-level workloads: PHP, Python, Node.js, etc.
    sy (system)CPU time in kernel-space. High values suggest frequent system calls — disk I/O, network operations, or context switching
    ni (nice)CPU time for lower-priority user-space processes. Usually low.
    id (idle)Remaining free CPU. Subtract from 100 to get rough total utilization.
    wa (iowait)Time the CPU waited for I/O. High wa (above 20–30%) suggests a disk bottleneck rather than a true CPU problem.
    hi (hw irq)Hardware interrupt requests — usually near zero unless heavy network traffic.
    si (sw irq)Software interrupts. High values can indicate heavy network processing.
    st (steal)CPU time stolen by the hypervisor. Persistent steal above 5–10% suggests overselling.

    Diagnostic Decision Tree

    • High us (user): Application code is the problem. Find which process via top → P sort, then profile the application.
    • High sy (system): Kernel is doing heavy work. Check for excessive forks, context switching, or filesystem churn with vmstat 1 5.
    • High wa (iowait): I/O is blocking. Use iostat -x 1 5 or iotop to find the disk-hungry process. Do not confuse with a CPU problem.
    • High st (steal): The hypervisor is throttling your VPS. Consider upgrading your plan or contacting support if steal is consistently elevated.

    4. Common Culprits

    Runaway PHP Processes

    PHP-FPM worker processes are one of the most frequent CPU offenders on VPS stacks running WordPress, WooCommerce, or Drupal. A slow database query, an infinite loop in a plugin, or a traffic surge can cause workers to pile up.

    Inspect PHP-FPM workers
    # Count active PHP-FPM workers
    ps aux | grep php-fpm | grep -v grep | wc -l
    
    # See which PHP processes are consuming CPU
    ps aux --sort=-%cpu | grep php | head -20
    
    # Check PHP-FPM pool status (if status endpoint is enabled)
    curl http://127.0.0.1/status?full

    Warning signs specific to PHP:

    • Dozens of php-fpm workers all stuck at the same CPU% with identical memory footprint
    • Workers accumulating over time without dying (check TIME+ in top — values above several minutes are suspicious)
    • Access logs showing a flood of POST requests to /xmlrpc.php or wp-login.php — bot traffic triggering PHP execution

    WARNING: A common PHP-FPM trap: setting pm.max_children too high consumes all available RAM, which causes the kernel to swap, which causes iowait to spike, making the system appear CPU-bound when it is actually memory-bound.

    Stuck or Looping Cron Jobs

    Cron jobs that fail to exit, are scheduled too frequently, or run longer than their interval can stack up and consume significant CPU.

    Check cron-related processes
    # Check running cron-related processes
    ps aux | grep -E '(cron|curl|wget|php|python|bash)' | grep -v grep
    
    # View all user crontabs
    for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l 2>/dev/null; done
    
    # Check system-level cron jobs
    cat /etc/crontab
    ls /etc/cron.d/ /etc/cron.hourly/ /etc/cron.daily/

    Symptoms of a stuck cron job:

    • Multiple instances of the same script in ps output with increasing PID numbers
    • Process has a high TIME+ value relative to how long it should legitimately run
    • CPU spikes occur on a predictable schedule correlating with a cron entry

    Cryptocurrency Miners from Compromised Servers

    Cryptocurrency miners are frequently deployed on compromised servers via vulnerabilities in web applications, exposed Docker APIs, or weak SSH credentials. They typically manifest as sustained 80–100% CPU usage.

    Detect cryptocurrency miners
    # Look for known miner process names
    ps aux | grep -iE '(xmrig|minerd|cpuminer|kworker|kthreadd)' | grep -v grep
    
    # Check for processes with no associated file on disk (deleted binaries)
    ls -la /proc/*/exe 2>/dev/null | grep deleted
    
    # Unusual outbound network connections from high-CPU processes
    ss -tulpn | grep -v '127.0.0.1'
    netstat -antp | grep ESTABLISHED
    
    # Check for recently modified or new binaries in common drop locations
    find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null
    
    # Inspect process binary path (replace PID)
    ls -la /proc/PID/exe
    cat /proc/PID/cmdline | tr '\0' ' '

    DANGER: Miners often disguise themselves with names that resemble legitimate system processes such as kworker, sshd, or java. Always verify suspicious high-CPU processes by checking their actual binary path via /proc/PID/exe rather than trusting the COMMAND column alone.

    Red flags that suggest a miner rather than a legitimate process:

    • Process binary resolves to /tmp, /dev/shm, or a hidden directory
    • CPU usage is consistently high (85–99%) across all cores, sustained over hours
    • The process was started recently but the server has been running for weeks
    • Active outbound connections on non-standard ports (TCP 3333, 4444, 5555, 7777, 14444 — common mining pool ports)
    • No log entries or shell history explaining when or how the process started
    • crontab, /etc/rc.local, or systemd unit files contain entries pointing to the executable

    5. When to Kill vs. When to Investigate

    Decision Framework

    The appropriate response depends on whether you understand what the process is and whether it is expected. Killing first and investigating later is reasonable in a production emergency — but investigation must still follow.

    ScenarioIndicators
    Kill Immediately
    • Process binary is in /tmp or /dev/shm
    • Active connections to unknown mining pool IPs
    • Duplicate cron processes stacking up without end
    • PHP/Python process consuming CPU for 30+ minutes with no user-facing request to justify it
    Investigate First
    • Known process (mysql, nginx, php-fpm) under unexpected load
    • System during a backup or indexing window
    • High iowait rather than user/sys CPU (different root cause)
    • Single spike vs. sustained elevation (spikes may self-resolve)

    How to Kill a Process Safely

    Kill process commands
    # Graceful termination (allows process to clean up) — try first
    kill -15 PID
    
    # Force kill (use if -15 has no effect after a few seconds)
    kill -9 PID
    
    # Kill all processes matching a name
    pkill -9 processname
    
    # Kill all PHP-FPM workers and let the master restart them
    pkill -9 php-fpm && systemctl restart php8.1-fpm

    DANGER: Sending kill -9 to a database process (MySQL, PostgreSQL, Redis) without a graceful shutdown can corrupt data files or require crash recovery on the next start. Use systemctl stop servicename instead, which sends the correct signal sequence.

    Post-Kill Investigation Steps

    Whether the process was legitimate or malicious, document what happened and prevent recurrence:

    • For PHP/application spikes: Review slow query logs (/var/log/mysql/slow.log), PHP-FPM access logs, and application error logs in the timeframe surrounding the spike.
    • For stuck cron jobs: Add a lock mechanism using flock to prevent concurrent execution, and set a maximum runtime with timeout prefix in the crontab entry.
    • For suspected miners/compromise: Do not just kill the process. Check for persistence mechanisms in crontab, /etc/rc.local, /etc/cron.d/, and systemd unit files. Consider the server compromised until proven otherwise and review Fail2ban and auth.log for unauthorized access.
    • In all cases: Record the PID, binary path, user, and network connections before terminating if possible. Use cp /proc/PID/exe /tmp/evidence.bin to preserve the binary for analysis.

    6. Quick Reference: Diagnostic Commands

    CommandPurpose
    top -b -n 1Single-snapshot output to stdout — useful for logging
    ps aux --sort=-%cpu | headTop CPU-consuming processes sorted at point-in-time
    htopInteractive process viewer with per-core bars
    uptimeLoad averages and system uptime
    nprocNumber of logical CPUs (denominator for load averages)
    vmstat 1 5CPU breakdown, context switches, and memory per second for 5 samples
    iostat -x 1 5Per-disk I/O stats to confirm or rule out iowait as root cause
    iotopReal-time per-process disk I/O — requires root
    ss -tulpnOpen ports and associated processes
    netstat -antpAll TCP connections with PIDs — identify miner pool connections
    find /tmp /var/tmp -executableScan for executable files in world-writable directories
    ls -la /proc/PID/exeResolve true binary path for a process
    strace -p PIDTrace system calls of a live process
    lsof -p PIDList all files and sockets open by a process

    Need more help?: Open a support ticket at my.ramnode.com or consult the RamNode knowledge base for additional guides on server hardening, PHP-FPM tuning, and incident response procedures.