How to Troubleshoot CPU 100% on Linux - top, ps, load average Guide

How to Troubleshoot CPU 100% on Linux - top, ps, load average Guide

What You'll Learn

  • How to quickly identify the process causing CPU 100%
  • How to distinguish between "true CPU issue" vs "I/O wait" or "swap thrashing"
  • How to save evidence (logs/values) before restart for prevention

Quick Summary

When CPU is high, follow this order:

  1. Overall status: uptime (load average)
  2. Find culprit with top: top (CPU sort, also check %us/%sy/%wa)
  3. ps for top list: ps aux --sort=-%cpu | head
  4. If it's a service, check logs: systemctl status / journalctl -u
  5. Rule out "looks like CPU but isn't": I/O wait (wa) / memory shortage (swap)

Prerequisites

  • OS: Ubuntu
  • Target: Server beginners
  • sudo access
  • Goal: Isolation -> Root cause -> Prevention entry

1. Two Types of "CPU 100%" (Blind Spot)

Conclusion: High CPU is true compute (%us high) or I/O wait (%wa high) — identify first.

When CPU is high, the cause is usually one of these:

Type A: Actually CPU Computation Is Heavy

  • Calculation/loops/encryption/conversion/aggregation
  • top shows specific process eating CPU
  • %us (user CPU) is usually high

Type B: Looks Like CPU But Actually I/O Wait / Memory Shortage

  • Slow disk (log bloat, DB, EBS degradation, I/O limit)
  • Swap growing = extremely slow (memory shortage)
  • %wa (I/O wait) is usually high

"CPU 100% so add more CPU" is premature. First check %wa and swap to confirm "Is it really CPU?"

2. uptime for load average (Overall Pressure)

Conclusion: Run uptime — load above core count signals sustained system pressure.

$ uptime

Example output:

 14:05:12 up 10 days,  3:20,  2 users,  load average: 3.52, 3.10, 2.90

What load average means (simplified for beginners):

  • If CPU cores = 4, load around 4 means "busy"
  • Load at 10 means pretty congested

Note: load includes I/O wait, not just CPU, so next look at top for breakdown.

3. top for "Culprit" and "CPU Breakdown"

Conclusion: Run top, press P to sort, then read %us/%sy/%wa to classify the load.

$ top

3-1. top Shortcuts (Must Know)

  • P: Sort by CPU
  • M: Sort by memory (when memory is suspect)
  • 1: Show per-core CPU (check for imbalance)
  • q: Quit

3-2. Reading CPU Breakdown (%us / %sy / %wa)

Shown at top of the display:

  • %us: App computation using CPU
  • %sy: Kernel/system processing (network/disk/interrupts)
  • %wa: I/O wait (disk slow/congested)

Quick interpretation:

  • %us high -> App processing is heavy (Type A)
  • %wa high -> I/O is congested (Type B)
  • %sy high -> System-side processing is heavy

4. ps for CPU Top List (Save Evidence)

Conclusion: Run ps aux --sort=-%cpu | head -n 20 and save the output before any restart.

top shows a moment, but ps can capture a snapshot.

$ ps aux --sort=-%cpu | head -n 20

Key columns to look at:

  • COMMAND: What's eating CPU
  • %CPU: Which is the outlier

If many of the same type (worker proliferation), likely missing limits or excessive load.

5. If It's a Service: systemctl / journalctl

Conclusion: When a named service is the cause, check systemctl status and journalctl -u.

If the process is nginx, apache2, or an app (php-fpm, node, gunicorn, etc.), logs often show the cause.

5-1. Service Status

$ sudo systemctl status nginx
$ sudo systemctl status apache2
$ sudo systemctl status php8.1-fpm

5-2. Logs (Last 200 Lines)

$ sudo journalctl -u nginx -n 200
$ sudo journalctl -u php8.1-fpm -n 200

When CPU is high, often there's "massive requests", "error spam", or "restart loops" happening. First look at logs for patterns.

6. Rule Out "Actually Not CPU" (Important)

Conclusion: Run free -h and check %wa in top — high swap or %wa means I/O, not CPU.

Skip this and you'll misdiagnose.

6-1. Check for Memory Shortage (Swap)

$ free -h

If swap is growing / available is small, might look like CPU but root cause is memory.

6-2. If %wa Is High, Suspect Disk I/O

  • Log bloat
  • DB
  • Docker layer explosion
  • Storage performance issues
$ iostat -x 1 5

If I/O wait is the cause, adding CPU won't help (wasted opportunity).

7. Common Cause Patterns (By Frequency)

Conclusion: Runaway, traffic spike, worker burst, I/O masking — each looks different in top.

Pattern 1: Infinite Loop / Bug / Runaway

  • One process consuming CPU constantly (%us high)
  • Fix: Check logs, recent deployments, recent changes

Pattern 2: Bot / Scan / Traffic Surge

  • nginx/apache access logs exploding
  • Fix: Check web logs, consider rate limiting/WAF/caching

Pattern 3: Too Many Workers (php-fpm, etc.)

  • Child processes keep growing, eating CPU and memory
  • Fix: Set worker limits (MaxChildren, etc.)

Pattern 4: I/O Is Slow, So CPU Appears High

  • %wa is high
  • Fix: Investigate disk I/O

8. Things to Avoid

Conclusion: Save uptime, free -h, ps aux before restart; verify %wa before scaling.

Don't: Restart Without Saving Evidence

At minimum, capture these before deciding:

  • uptime
  • free -h
  • ps aux --sort=-%cpu | head
  • top (screenshot if possible)

Don't: Scale Up CPU Without Checking %wa

If I/O wait is the cause, CPU scaling is a waste (opportunity cost).

Don't: Leave Unlimited Worker Settings Alone

System will break the moment traffic increases. Set limits first.

Copy-Paste Template

# 1) Overview
uptime

# 2) Also check memory (rule out "actually not CPU")
free -h

# 3) CPU top list (evidence)
ps aux --sort=-%cpu | head -n 20

# 4) top for breakdown (%us/%sy/%wa) and culprit
top

# 5) If it's a service, check logs
sudo systemctl status <service>
sudo journalctl -u <service> -n 200

Summary

  • CPU high: first distinguish "true CPU" vs "I/O wait / memory shortage"
  • uptime -> top -> ps -> journalctl is the fastest path
  • Restart is last resort. Save evidence before deciding.

Next Reading