How to Troubleshoot CPU 100% on Linux - top, ps, load average Guide
What You'll Learn
- How to quickly identify the process causing CPU 100%
- How to distinguish between "true CPU issue" vs "I/O wait" or "swap thrashing"
- How to save evidence (logs/values) before restart for prevention
Quick Summary
When CPU is high, follow this order:
- Overall status:
uptime(load average) - Find culprit with top:
top(CPU sort, also check %us/%sy/%wa) - ps for top list:
ps aux --sort=-%cpu | head - If it's a service, check logs:
systemctl status/journalctl -u - Rule out "looks like CPU but isn't": I/O wait (wa) / memory shortage (swap)
Prerequisites
- OS: Ubuntu
- Target: Server beginners
sudoaccess- Goal: Isolation -> Root cause -> Prevention entry
1. Two Types of "CPU 100%" (Blind Spot)
Conclusion: High CPU is true compute (
%ushigh) or I/O wait (%wahigh) — identify first.
When CPU is high, the cause is usually one of these:
Type A: Actually CPU Computation Is Heavy
- Calculation/loops/encryption/conversion/aggregation
- top shows specific process eating CPU
%us(user CPU) is usually high
Type B: Looks Like CPU But Actually I/O Wait / Memory Shortage
- Slow disk (log bloat, DB, EBS degradation, I/O limit)
- Swap growing = extremely slow (memory shortage)
%wa(I/O wait) is usually high
"CPU 100% so add more CPU" is premature. First check %wa and swap to confirm "Is it really CPU?"
2. uptime for load average (Overall Pressure)
Conclusion: Run
uptime— load above core count signals sustained system pressure.
$ uptime
Example output:
14:05:12 up 10 days, 3:20, 2 users, load average: 3.52, 3.10, 2.90
What load average means (simplified for beginners):
- If CPU cores = 4, load around 4 means "busy"
- Load at 10 means pretty congested
Note: load includes I/O wait, not just CPU, so next look at top for breakdown.
3. top for "Culprit" and "CPU Breakdown"
Conclusion: Run
top, pressPto sort, then read%us/%sy/%wato classify the load.
$ top
3-1. top Shortcuts (Must Know)
P: Sort by CPUM: Sort by memory (when memory is suspect)1: Show per-core CPU (check for imbalance)q: Quit
3-2. Reading CPU Breakdown (%us / %sy / %wa)
Shown at top of the display:
%us: App computation using CPU%sy: Kernel/system processing (network/disk/interrupts)%wa: I/O wait (disk slow/congested)
Quick interpretation:
%ushigh -> App processing is heavy (Type A)%wahigh -> I/O is congested (Type B)%syhigh -> System-side processing is heavy
4. ps for CPU Top List (Save Evidence)
Conclusion: Run
ps aux --sort=-%cpu | head -n 20and save the output before any restart.
top shows a moment, but ps can capture a snapshot.
$ ps aux --sort=-%cpu | head -n 20
Key columns to look at:
COMMAND: What's eating CPU%CPU: Which is the outlier
If many of the same type (worker proliferation), likely missing limits or excessive load.
5. If It's a Service: systemctl / journalctl
Conclusion: When a named service is the cause, check
systemctl statusandjournalctl -u.
If the process is nginx, apache2, or an app (php-fpm, node, gunicorn, etc.), logs often show the cause.
5-1. Service Status
$ sudo systemctl status nginx $ sudo systemctl status apache2 $ sudo systemctl status php8.1-fpm
5-2. Logs (Last 200 Lines)
$ sudo journalctl -u nginx -n 200 $ sudo journalctl -u php8.1-fpm -n 200
When CPU is high, often there's "massive requests", "error spam", or "restart loops" happening. First look at logs for patterns.
6. Rule Out "Actually Not CPU" (Important)
Conclusion: Run
free -hand check%wain top — high swap or%wameans I/O, not CPU.
Skip this and you'll misdiagnose.
6-1. Check for Memory Shortage (Swap)
$ free -h
If swap is growing / available is small, might look like CPU but root cause is memory.
6-2. If %wa Is High, Suspect Disk I/O
- Log bloat
- DB
- Docker layer explosion
- Storage performance issues
$ iostat -x 1 5
If I/O wait is the cause, adding CPU won't help (wasted opportunity).
7. Common Cause Patterns (By Frequency)
Conclusion: Runaway, traffic spike, worker burst, I/O masking — each looks different in top.
Pattern 1: Infinite Loop / Bug / Runaway
- One process consuming CPU constantly (%us high)
- Fix: Check logs, recent deployments, recent changes
Pattern 2: Bot / Scan / Traffic Surge
- nginx/apache access logs exploding
- Fix: Check web logs, consider rate limiting/WAF/caching
Pattern 3: Too Many Workers (php-fpm, etc.)
- Child processes keep growing, eating CPU and memory
- Fix: Set worker limits (MaxChildren, etc.)
Pattern 4: I/O Is Slow, So CPU Appears High
- %wa is high
- Fix: Investigate disk I/O
8. Things to Avoid
Conclusion: Save
uptime,free -h,ps auxbefore restart; verify%wabefore scaling.
Don't: Restart Without Saving Evidence
At minimum, capture these before deciding:
uptimefree -hps aux --sort=-%cpu | headtop(screenshot if possible)
Don't: Scale Up CPU Without Checking %wa
If I/O wait is the cause, CPU scaling is a waste (opportunity cost).
Don't: Leave Unlimited Worker Settings Alone
System will break the moment traffic increases. Set limits first.
Copy-Paste Template
# 1) Overview uptime # 2) Also check memory (rule out "actually not CPU") free -h # 3) CPU top list (evidence) ps aux --sort=-%cpu | head -n 20 # 4) top for breakdown (%us/%sy/%wa) and culprit top # 5) If it's a service, check logs sudo systemctl status <service> sudo journalctl -u <service> -n 200
Summary
- CPU high: first distinguish "true CPU" vs "I/O wait / memory shortage"
uptime -> top -> ps -> journalctlis the fastest path- Restart is last resort. Save evidence before deciding.