Ubuntu Memory Troubleshooting - free, top, ps, and OOM Killer Guide

2025-12-16 Reading time: About 12 min Difficulty: Intermediate

What You'll Learn

How to determine if server slowness/crashes are caused by memory issues
How to identify memory-hungry processes
How to check if OOM Killer (OS kills process due to memory) has triggered
Move beyond "just restart" to prevent recurrence

Quick Summary

When memory is suspect, follow this order:

Overview: free -h (memory/swap status)
Find culprit: top (high RES, high CPU)
Top list: ps aux --sort=-%mem | head
OOM evidence: journalctl -k | grep -i oom
Fix: Address the process/service (logs, config, memory limits, scaling, swap)

Prerequisites

OS: Ubuntu
Target: Server beginners
sudo access
Goal: Isolation, recovery, and basic prevention

1. Memory Issue Symptoms

Conclusion: Crashes, sluggish SSH, and 502/503 are memory symptoms — check memory first.

"Memory issues" can look many ways:

Server extremely slow (SSH sluggish, commands hang)
Apps crash suddenly / 502/503 increase
Processes die unexpectedly (errors in logs)
Same issue recurs after restart

Important: Even 100% CPU can be caused by memory (swap thrashing). Don't misdiagnose by looking at CPU alone.

2. free -h for Overview (First Step)

Conclusion: In free -h, low available or growing swap are the clearest danger signals.

$ free -h

Example output:

              total        used        free      shared  buff/cache   available
Mem:           2.0Gi       1.9Gi        30Mi       120Mi        70Mi        80Mi
Swap:          1.0Gi       1.0Gi         0Mi

Key points:

available: Actual usable memory estimate (small = trouble)
Swap: If swap is nearly full, danger (swap thrashing causes extreme slowness)

Rough guidelines:

available at few tens of MB -> Very dangerous (OOM or extreme slowness likely)
Swap keeps growing -> Root cause likely not addressed

3. top to Find "Current Culprit" (Look at RES)

Conclusion: In top, sort by M and check RES — it shows actual memory used per process.

$ top

What beginners should look at:

%MEM: Memory usage percentage
RES: Actual memory usage (important)
%CPU: CPU usage percentage

top shortcuts:

M: Sort by memory
P: Sort by CPU
q: Quit

4. ps for Memory Top List (Create Evidence)

Conclusion: Run ps aux --sort=-%mem | head -n 20 and save the output before any restart.

$ ps aux --sort=-%mem | head -n 20

Also check CPU top for comparison:

$ ps aux --sort=-%cpu | head -n 20

Common patterns:

One giant process (e.g., java, node, php-fpm, python, db)
Many same type (worker runaway, process proliferation)
Docker containers multiplying

5. Check OOM Killer Activity (Often Decisive)

Conclusion: Run journalctl -k | grep -i oom — the OOM Killer log names the killed process.

OOM Killer means "OS ran out of memory and force-killed a process". Very common cause of sudden process death.

5-1. Search Kernel Logs (Recommended)

$ sudo journalctl -k | grep -i oom | tail -n 50

5-2. Search for "killed process"

$ sudo journalctl -k | grep -i "killed process" | tail -n 50

5-3. dmesg Also Works

$ dmesg | grep -i oom | tail -n 50

Typical log:

Out of memory: Killed process 1234 (node) total-vm:... anon-rss:...

The killed process name appears here. That's the main suspect.

6. Types of Memory Issues (Response Differs)

Conclusion: Classify as spike, leak, or proliferation — each type requires a different fix.

Type A: Temporary Spike (Burst)

Batch processing, heavy aggregation, image processing, etc.
Fix: Isolate processing, memory limits/worker count tuning, add swap, scale up

Type B: Leak / Keeps Growing (Worsens Over Time)

Node/Python/Java apps grow over long runtime
Fix: Investigate app memory leak, worker/process restart design, set limits

Type C: Process Proliferation (Fork Bomb / Too Many Workers)

php-fpm children grow too much, queue/traffic causes worker explosion
Fix: Set child process/worker limits, load testing, rate limiting

7. Quick Recovery Steps (Safety First)

Conclusion: Check logs before restarting — a restart that helps confirms the issue recurs.

7-1. If You Know Which Service Is Heavy: Check Logs First

$ sudo journalctl -u nginx -n 200
$ sudo journalctl -u app -n 200

7-2. Service Restart (Last Resort But Sometimes Necessary)

$ sudo systemctl restart <service>
$ sudo systemctl status <service>

Note: If restart fixes it, the cause is "memory accumulating" type. Will recur if left alone.

8. Swap Makes It Extremely Slow (Swap Hell)

Conclusion: Heavy swap creates a vicious cycle: low memory drives disk I/O, slowing further.

When swap grows, memory shortage -> disk I/O -> even slower - a hellish cycle.

$ free -h

If swap is heavily used, that's enough to know.

9. Things to Avoid

Conclusion: Avoid restart-without-OOM-log, treating swap as permanent, or unlimited workers.

Don't: Restart Repeatedly Without Looking at Cause

Restarting without checking logs and OOM repeats the same accident.

Don't: Think Swap Is a "Silver Bullet"

Swap is a temporary relief, not a root fix. Also causes performance degradation.

Don't: Leave Unlimited Worker/Process Settings Alone

php-fpm and app workers without limits will eat all memory. Always set limits.

Copy-Paste Template

# 1) Overview
free -h

# 2) Current culprit (by memory)
top

# 3) Memory top list (evidence)
ps aux --sort=-%mem | head -n 20

# 4) OOM evidence (decisive)
sudo journalctl -k | grep -i oom | tail -n 50
sudo journalctl -k | grep -i "killed process" | tail -n 50

Summary

Check free -h for available and Swap first
Use ps aux --sort=-%mem to identify top consumers
Use journalctl -k | grep -i oom to confirm OOM
If restart fixes it temporarily, it will almost certainly recur. Classify and address root cause.