Ubuntu Memory Troubleshooting: free/top/ps and OOM Killer Guide

Memory Troubleshooting - free/top/ps

What You'll Learn

  • How to determine if server slowness/crashes are caused by memory issues
  • How to identify memory-hungry processes
  • How to check if OOM Killer (OS kills process due to memory) has triggered
  • Move beyond "just restart" to prevent recurrence

Quick Summary

When memory is suspect, follow this order:

  1. Overview: free -h (memory/swap status)
  2. Find culprit: top (high RES, high CPU)
  3. Top list: ps aux --sort=-%mem | head
  4. OOM evidence: journalctl -k | grep -i oom
  5. Fix: Address the process/service (logs, config, memory limits, scaling, swap)

Table of Contents

  1. Memory Issue Symptoms
  2. free -h for Overview
  3. top to Find "Current Culprit"
  4. ps for Memory Top List
  5. Check OOM Killer Activity
  6. Types of Memory Issues
  7. Quick Recovery Steps
  8. Swap Makes It Extremely Slow
  9. Things to Avoid

Prerequisites

  • OS: Ubuntu
  • Target: Server beginners
  • sudo access
  • Goal: Isolation, recovery, and basic prevention

1. Memory Issue Symptoms

"Memory issues" can look many ways:

  • Server extremely slow (SSH sluggish, commands hang)
  • Apps crash suddenly / 502/503 increase
  • Processes die unexpectedly (errors in logs)
  • Same issue recurs after restart

Important: Even 100% CPU can be caused by memory (swap thrashing). Don't misdiagnose by looking at CPU alone.

2. free -h for Overview (First Step)

$ free -h

Example output:

              total        used        free      shared  buff/cache   available
Mem:           2.0Gi       1.9Gi        30Mi       120Mi        70Mi        80Mi
Swap:          1.0Gi       1.0Gi         0Mi

Key points:

  • available: Actual usable memory estimate (small = trouble)
  • Swap: If swap is nearly full, danger (swap thrashing causes extreme slowness)

Rough guidelines:

  • available at few tens of MB → Very dangerous (OOM or extreme slowness likely)
  • Swap keeps growing → Root cause likely not addressed

3. top to Find "Current Culprit" (Look at RES)

$ top

What beginners should look at:

  • %MEM: Memory usage percentage
  • RES: Actual memory usage (important)
  • %CPU: CPU usage percentage

top shortcuts:

  • M: Sort by memory
  • P: Sort by CPU
  • q: Quit

4. ps for Memory Top List (Create Evidence)

$ ps aux --sort=-%mem | head -n 20

Also check CPU top for comparison:

$ ps aux --sort=-%cpu | head -n 20

Common patterns:

  • One giant process (e.g., java, node, php-fpm, python, db)
  • Many same type (worker runaway, process proliferation)
  • Docker containers multiplying

5. Check OOM Killer Activity (Often Decisive)

OOM Killer means "OS ran out of memory and force-killed a process". Very common cause of sudden process death.

5-1. Search Kernel Logs (Recommended)

$ sudo journalctl -k | grep -i oom | tail -n 50

5-2. Search for "killed process"

$ sudo journalctl -k | grep -i "killed process" | tail -n 50

5-3. dmesg Also Works

$ dmesg | grep -i oom | tail -n 50

Typical log:

Out of memory: Killed process 1234 (node) total-vm:... anon-rss:...

The killed process name appears here. That's the main suspect.

6. Types of Memory Issues (Response Differs)

Type A: Temporary Spike (Burst)

  • Batch processing, heavy aggregation, image processing, etc.
  • Fix: Isolate processing, memory limits/worker count tuning, add swap, scale up

Type B: Leak / Keeps Growing (Worsens Over Time)

  • Node/Python/Java apps grow over long runtime
  • Fix: Investigate app memory leak, worker/process restart design, set limits

Type C: Process Proliferation (Fork Bomb / Too Many Workers)

  • php-fpm children grow too much, queue/traffic causes worker explosion
  • Fix: Set child process/worker limits, load testing, rate limiting

7. Quick Recovery Steps (Safety First)

7-1. If You Know Which Service Is Heavy: Check Logs First

$ sudo journalctl -u nginx -n 200
$ sudo journalctl -u app -n 200

7-2. Service Restart (Last Resort But Sometimes Necessary)

$ sudo systemctl restart <service>
$ sudo systemctl status <service>

Note: If restart fixes it, the cause is "memory accumulating" type. Will recur if left alone.

8. Swap Makes It Extremely Slow (Swap Hell)

When swap grows, memory shortage → disk I/O → even slower - a hellish cycle.

$ free -h

If swap is heavily used, that's enough to know.

9. Things to Avoid

Don't: Restart Repeatedly Without Looking at Cause

Restarting without checking logs and OOM repeats the same accident.

Don't: Think Swap Is a "Silver Bullet"

Swap is a temporary relief, not a root fix. Also causes performance degradation.

Don't: Leave Unlimited Worker/Process Settings Alone

php-fpm and app workers without limits will eat all memory. Always set limits.

Copy-Paste Template

# 1) Overview
free -h

# 2) Current culprit (by memory)
top

# 3) Memory top list (evidence)
ps aux --sort=-%mem | head -n 20

# 4) OOM evidence (decisive)
sudo journalctl -k | grep -i oom | tail -n 50
sudo journalctl -k | grep -i "killed process" | tail -n 50

Summary

  • Check free -h for available and Swap first
  • Use ps aux --sort=-%mem to identify top consumers
  • Use journalctl -k | grep -i oom to confirm OOM
  • If restart fixes it temporarily, it will almost certainly recur. Classify and address root cause.

Test Environment

Commands in this article were tested on Ubuntu 24.04 LTS / bash 5.2.

Next Reading