Ubuntu CPU 100% Troubleshooting: top/ps/load average Guide

CPU High Load Troubleshooting - top/ps

What You'll Learn

  • How to quickly identify the process causing CPU 100%
  • How to distinguish between "true CPU issue" vs "I/O wait" or "swap thrashing"
  • How to save evidence (logs/values) before restart for prevention

Quick Summary

When CPU is high, follow this order:

  1. Overall status: uptime (load average)
  2. Find culprit with top: top (CPU sort, also check %us/%sy/%wa)
  3. ps for top list: ps aux --sort=-%cpu | head
  4. If it's a service, check logs: systemctl status / journalctl -u
  5. Rule out "looks like CPU but isn't": I/O wait (wa) / memory shortage (swap)

Table of Contents

  1. Two Types of "CPU 100%"
  2. uptime for load average
  3. top for "Culprit" and "CPU Breakdown"
  4. ps for CPU Top List
  5. Services: systemctl / journalctl
  6. Rule Out "Actually Not CPU"
  7. Common Cause Patterns
  8. Things to Avoid

Prerequisites

  • OS: Ubuntu
  • Target: Server beginners
  • sudo access
  • Goal: Isolation → Root cause → Prevention entry

1. Two Types of "CPU 100%" (Blind Spot)

When CPU is high, the cause is usually one of these:

Type A: Actually CPU Computation Is Heavy

  • Calculation/loops/encryption/conversion/aggregation
  • top shows specific process eating CPU
  • %us (user CPU) is usually high

Type B: Looks Like CPU But Actually I/O Wait / Memory Shortage

  • Slow disk (log bloat, DB, EBS degradation, I/O limit)
  • Swap growing = extremely slow (memory shortage)
  • %wa (I/O wait) is usually high

"CPU 100% so add more CPU" is premature. First check %wa and swap to confirm "Is it really CPU?"

2. uptime for load average (Overall Pressure)

$ uptime

Example output:

 14:05:12 up 10 days,  3:20,  2 users,  load average: 3.52, 3.10, 2.90

What load average means (simplified for beginners):

  • If CPU cores = 4, load around 4 means "busy"
  • Load at 10 means pretty congested

Note: load includes I/O wait, not just CPU, so next look at top for breakdown.

3. top for "Culprit" and "CPU Breakdown"

$ top

3-1. top Shortcuts (Must Know)

  • P: Sort by CPU
  • M: Sort by memory (when memory is suspect)
  • 1: Show per-core CPU (check for imbalance)
  • q: Quit

3-2. Reading CPU Breakdown (%us / %sy / %wa)

Shown at top of the display:

  • %us: App computation using CPU
  • %sy: Kernel/system processing (network/disk/interrupts)
  • %wa: I/O wait (disk slow/congested)

Quick interpretation:

  • %us high → App processing is heavy (Type A)
  • %wa high → I/O is congested (Type B)
  • %sy high → System-side processing is heavy

4. ps for CPU Top List (Save Evidence)

top shows a moment, but ps can capture a snapshot.

$ ps aux --sort=-%cpu | head -n 20

Key columns to look at:

  • COMMAND: What's eating CPU
  • %CPU: Which is the outlier

If many of the same type (worker proliferation), likely missing limits or excessive load.

5. If It's a Service: systemctl / journalctl

If the process is nginx, apache2, or an app (php-fpm, node, gunicorn, etc.), logs often show the cause.

5-1. Service Status

$ sudo systemctl status nginx
$ sudo systemctl status apache2
$ sudo systemctl status php8.1-fpm

5-2. Logs (Last 200 Lines)

$ sudo journalctl -u nginx -n 200
$ sudo journalctl -u php8.1-fpm -n 200

When CPU is high, often there's "massive requests", "error spam", or "restart loops" happening. First look at logs for patterns.

6. Rule Out "Actually Not CPU" (Important)

Skip this and you'll misdiagnose.

6-1. Check for Memory Shortage (Swap)

$ free -h

If swap is growing / available is small, might look like CPU but root cause is memory.

6-2. If %wa Is High, Suspect Disk I/O

  • Log bloat
  • DB
  • Docker layer explosion
  • Storage performance issues
$ iostat -x 1 5

If I/O wait is the cause, adding CPU won't help (wasted opportunity).

7. Common Cause Patterns (By Frequency)

Pattern 1: Infinite Loop / Bug / Runaway

  • One process consuming CPU constantly (%us high)
  • Fix: Check logs, recent deployments, recent changes

Pattern 2: Bot / Scan / Traffic Surge

  • nginx/apache access logs exploding
  • Fix: Check web logs, consider rate limiting/WAF/caching

Pattern 3: Too Many Workers (php-fpm, etc.)

  • Child processes keep growing, eating CPU and memory
  • Fix: Set worker limits (MaxChildren, etc.)

Pattern 4: I/O Is Slow, So CPU Appears High

  • %wa is high
  • Fix: Investigate disk I/O

8. Things to Avoid

Don't: Restart Without Saving Evidence

At minimum, capture these before deciding:

  • uptime
  • free -h
  • ps aux --sort=-%cpu | head
  • top (screenshot if possible)

Don't: Scale Up CPU Without Checking %wa

If I/O wait is the cause, CPU scaling is a waste (opportunity cost).

Don't: Leave Unlimited Worker Settings Alone

System will break the moment traffic increases. Set limits first.

Copy-Paste Template

# 1) Overview
uptime

# 2) Also check memory (rule out "actually not CPU")
free -h

# 3) CPU top list (evidence)
ps aux --sort=-%cpu | head -n 20

# 4) top for breakdown (%us/%sy/%wa) and culprit
top

# 5) If it's a service, check logs
sudo systemctl status <service>
sudo journalctl -u <service> -n 200

Summary

  • CPU high: first distinguish "true CPU" vs "I/O wait / memory shortage"
  • uptime → top → ps → journalctl is the fastest path
  • Restart is last resort. Save evidence before deciding.

Test Environment

Commands in this article were tested on Ubuntu 24.04 LTS / bash 5.2.

Next Reading