How to Troubleshoot CPU 100% on Linux - top, ps, load average Guide

2025-12-16 Reading time: About 10 min Difficulty: Intermediate

What You'll Learn

How to quickly identify the process causing CPU 100%
How to distinguish between "true CPU issue" vs "I/O wait" or "swap thrashing"
How to save evidence (logs/values) before restart for prevention

Quick Summary

When CPU is high, follow this order:

Overall status: uptime (load average)
Find culprit with top: top (CPU sort, also check %us/%sy/%wa)
ps for top list: ps aux --sort=-%cpu | head
If it's a service, check logs: systemctl status / journalctl -u
Rule out "looks like CPU but isn't": I/O wait (wa) / memory shortage (swap)

Prerequisites

OS: Ubuntu
Target: Server beginners
sudo access
Goal: Isolation -> Root cause -> Prevention entry

1. Two Types of "CPU 100%" (Blind Spot)

Conclusion: High CPU is true compute (%us high) or I/O wait (%wa high) — identify first.

When CPU is high, the cause is usually one of these:

Type A: Actually CPU Computation Is Heavy

Calculation/loops/encryption/conversion/aggregation
top shows specific process eating CPU
%us (user CPU) is usually high

Type B: Looks Like CPU But Actually I/O Wait / Memory Shortage

Slow disk (log bloat, DB, EBS degradation, I/O limit)
Swap growing = extremely slow (memory shortage)
%wa (I/O wait) is usually high

"CPU 100% so add more CPU" is premature. First check %wa and swap to confirm "Is it really CPU?"

2. uptime for load average (Overall Pressure)

Conclusion: Run uptime — load above core count signals sustained system pressure.

$ uptime

Example output:

 14:05:12 up 10 days,  3:20,  2 users,  load average: 3.52, 3.10, 2.90

What load average means (simplified for beginners):

If CPU cores = 4, load around 4 means "busy"
Load at 10 means pretty congested

Note: load includes I/O wait, not just CPU, so next look at top for breakdown.

3. top for "Culprit" and "CPU Breakdown"

Conclusion: Run top, press P to sort, then read %us/%sy/%wa to classify the load.

$ top

3-1. top Shortcuts (Must Know)

P: Sort by CPU
M: Sort by memory (when memory is suspect)
1: Show per-core CPU (check for imbalance)
q: Quit

3-2. Reading CPU Breakdown (%us / %sy / %wa)

Shown at top of the display:

%us: App computation using CPU
%sy: Kernel/system processing (network/disk/interrupts)
%wa: I/O wait (disk slow/congested)

Quick interpretation:

%us high -> App processing is heavy (Type A)
%wa high -> I/O is congested (Type B)
%sy high -> System-side processing is heavy

4. ps for CPU Top List (Save Evidence)

Conclusion: Run ps aux --sort=-%cpu | head -n 20 and save the output before any restart.

top shows a moment, but ps can capture a snapshot.

$ ps aux --sort=-%cpu | head -n 20

Key columns to look at:

COMMAND: What's eating CPU
%CPU: Which is the outlier

If many of the same type (worker proliferation), likely missing limits or excessive load.

5. If It's a Service: systemctl / journalctl

Conclusion: When a named service is the cause, check systemctl status and journalctl -u.

If the process is nginx, apache2, or an app (php-fpm, node, gunicorn, etc.), logs often show the cause.

5-1. Service Status

$ sudo systemctl status nginx
$ sudo systemctl status apache2
$ sudo systemctl status php8.1-fpm

5-2. Logs (Last 200 Lines)

$ sudo journalctl -u nginx -n 200
$ sudo journalctl -u php8.1-fpm -n 200

When CPU is high, often there's "massive requests", "error spam", or "restart loops" happening. First look at logs for patterns.

6. Rule Out "Actually Not CPU" (Important)

Conclusion: Run free -h and check %wa in top — high swap or %wa means I/O, not CPU.

Skip this and you'll misdiagnose.

6-1. Check for Memory Shortage (Swap)

$ free -h

If swap is growing / available is small, might look like CPU but root cause is memory.

6-2. If %wa Is High, Suspect Disk I/O

Log bloat
DB
Docker layer explosion
Storage performance issues

$ iostat -x 1 5

If I/O wait is the cause, adding CPU won't help (wasted opportunity).

7. Common Cause Patterns (By Frequency)

Conclusion: Runaway, traffic spike, worker burst, I/O masking — each looks different in top.

Pattern 1: Infinite Loop / Bug / Runaway

One process consuming CPU constantly (%us high)
Fix: Check logs, recent deployments, recent changes

Pattern 2: Bot / Scan / Traffic Surge

nginx/apache access logs exploding
Fix: Check web logs, consider rate limiting/WAF/caching

Pattern 3: Too Many Workers (php-fpm, etc.)

Child processes keep growing, eating CPU and memory
Fix: Set worker limits (MaxChildren, etc.)

Pattern 4: I/O Is Slow, So CPU Appears High

%wa is high
Fix: Investigate disk I/O

8. Things to Avoid

Conclusion: Save uptime, free -h, ps aux before restart; verify %wa before scaling.

Don't: Restart Without Saving Evidence

At minimum, capture these before deciding:

uptime
free -h
ps aux --sort=-%cpu | head
top (screenshot if possible)

Don't: Scale Up CPU Without Checking %wa

If I/O wait is the cause, CPU scaling is a waste (opportunity cost).

Don't: Leave Unlimited Worker Settings Alone

System will break the moment traffic increases. Set limits first.

Copy-Paste Template

# 1) Overview
uptime

# 2) Also check memory (rule out "actually not CPU")
free -h

# 3) CPU top list (evidence)
ps aux --sort=-%cpu | head -n 20

# 4) top for breakdown (%us/%sy/%wa) and culprit
top

# 5) If it's a service, check logs
sudo systemctl status <service>
sudo journalctl -u <service> -n 200

Summary

CPU high: first distinguish "true CPU" vs "I/O wait / memory shortage"
uptime -> top -> ps -> journalctl is the fastest path
Restart is last resort. Save evidence before deciding.