How to Handle OOM Killer Events - When Processes Are Killed by Memory Exhaustion

How to Handle OOM Killer Events - When Processes Are Killed by Memory Exhaustion

What is the OOM killer?

The OOM killer (Out-Of-Memory killer) is a Linux kernel mechanism that forcibly terminates processes when both physical memory and swap are exhausted. Its goal is to prevent a full system crash. When a process disappears without explanation — no application-level crash log, exit code SIGKILL in systemctl status — OOM killer is the first thing to check.

Symptoms to look for

  • A process crashes with no application-level error
  • systemctl status shows exit-code=SIGKILL
  • dmesg contains Out of memory: Killed process

How to confirm an OOM killer event in logs

When the OOM killer fires, the kernel always writes a record to the kernel ring buffer and system log. Start here.

Check with dmesg

sudo dmesg | grep -i "out of memory"
sudo dmesg | grep -i "oom"
[1234567.890] Out of memory: Killed process 12345 (nginx) total-vm:512000kB, anon-rss:256000kB, file-rss:8000kB, shmem-rss:0kB, UID:0 pgtables:512kB oom_score_adj:0

Search kernel log with journalctl

sudo journalctl -k --since "2 hours ago" | grep -i oom
sudo journalctl -k -g "Out of memory"

The -k flag filters to kernel messages only. Use --since to narrow the time window.

Check syslog / kern.log

sudo grep -i "out of memory" /var/log/syslog
sudo grep -i "oom_killer" /var/log/kern.log

dmesg timestamps are seconds since boot. Use dmesg -T to display human-readable timestamps.

Identifying which process was killed

The log records detailed information about the killed process.

sudo dmesg | grep "Killed process"
[1234567.890] Killed process 12345 (nginx) total-vm:512000kB, anon-rss:256000kB

Reading the output:

Field Meaning
12345 Process PID
(nginx) Process name
total-vm Total virtual memory allocated
anon-rss Anonymous pages actually in RAM
file-rss File-backed pages in RAM
oom_score_adj Kill priority adjustment (0 is default)

To see memory state at the time of the event, print the context around the OOM message:

sudo dmesg | grep -A 30 "Out of memory" | head -50

How OOM scores work

The kernel assigns each process an oom_score (0–1000). Higher score means higher kill priority. The score is primarily based on memory usage relative to total physical RAM.

Check the current score

cat /proc/<PID>/oom_score

List all processes sorted by OOM score (highest first):

awk '{print $1}' /proc/*/status 2>/dev/null | \
  xargs -I{} sh -c 'echo "$(cat /proc/{}/oom_score 2>/dev/null) {} $(cat /proc/{}/comm 2>/dev/null)"' | \
  sort -rn | head -20

Adjusting the score with oom_score_adj

Write a value from -1000 to 1000 to /proc/<PID>/oom_score_adj to bias the score.

Value Effect
-1000 Completely excluded from OOM kill
-500 Much harder to kill
0 Default behavior
500 Easier to kill
1000 Killed first
# Make nginx much harder to OOM-kill (temporary)
echo -500 | sudo tee /proc/$(pgrep nginx)/oom_score_adj

A process set to -1000 can never be killed by the OOM killer. If that process has a memory leak, the system itself may crash. Restrict this to critical system services.

Immediate response — restart and reclaim memory

After the OOM killer fires, available memory should be restored. However, the root cause likely persists and the event will recur.

Restart the killed service

sudo systemctl restart <service-name>

Manually drop page cache

sync
echo 3 | sudo tee /proc/sys/vm/drop_caches

Find the top memory consumers

ps aux --sort=-%mem | head -20
free -h

Making oom_score_adj persistent

The oom_score_adj value resets when a process restarts. To persist it, add the setting to the systemd unit file.

sudo systemctl edit <service-name>

Add the following:

[Service]
OOMScoreAdjust=-500

Apply the change:

sudo systemctl daemon-reload
sudo systemctl restart <service-name>

Verify:

cat /proc/$(pgrep <service>)/oom_score_adj

Adding swap to reduce memory pressure

Missing or insufficient swap makes OOM killer events more likely. Adding a swap file provides a buffer for sudden memory spikes.

Create a swap file

# Create a 2 GB swap file
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Confirm it is active:

free -h
swapon --show
              total        used        free      shared  buff/cache   available
Mem:           3.8G        3.1G        100M         50M        600M        600M
Swap:          2.0G          0B        2.0G

Persist across reboots

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Swap is not a permanent fix

Swap cushions memory spikes but does not fix the underlying cause. If memory usage grows continuously over time, investigate for a memory leak. Also consider tuning swappiness (default 60; 10–20 is common for servers).

Root cause — audit memory configuration

Recurring OOM killer events mean the system is genuinely running out of memory. Score tuning and swap are mitigations, not fixes.

Common root causes

  1. Memory leak — RSS grows over time without release
  2. Oversized memory config — JVM -Xmx, innodb_buffer_pool_size, or similar exceeds physical RAM
  3. Too many concurrent workers — Apache MaxRequestWorkers, PHP-FPM pm.max_children, etc.

Monitor RSS growth over time

# Sample memory every 5 seconds
watch -n 5 "ps -p <PID> -o pid,rss,vsz,comm"

Cap service memory with cgroup (via systemd)

Setting an explicit memory ceiling per service isolates failures and prevents one runaway process from taking down the system.

sudo systemctl edit <service-name>
[Service]
MemoryMax=512M
MemorySwapMax=0

When the limit is hit, only that service is killed — other processes are unaffected.

Next Reading