How to Handle OOM Killer Events - When Processes Are Killed by Memory Exhaustion
What is the OOM killer?
The OOM killer (Out-Of-Memory killer) is a Linux kernel mechanism that forcibly terminates processes when both physical memory and swap are exhausted. Its goal is to prevent a full system crash. When a process disappears without explanation — no application-level crash log, exit code SIGKILL in systemctl status — OOM killer is the first thing to check.
Symptoms to look for
- A process crashes with no application-level error
systemctl statusshowsexit-code=SIGKILLdmesgcontainsOut of memory: Killed process
How to confirm an OOM killer event in logs
When the OOM killer fires, the kernel always writes a record to the kernel ring buffer and system log. Start here.
Check with dmesg
sudo dmesg | grep -i "out of memory" sudo dmesg | grep -i "oom"
[1234567.890] Out of memory: Killed process 12345 (nginx) total-vm:512000kB, anon-rss:256000kB, file-rss:8000kB, shmem-rss:0kB, UID:0 pgtables:512kB oom_score_adj:0
Search kernel log with journalctl
sudo journalctl -k --since "2 hours ago" | grep -i oom sudo journalctl -k -g "Out of memory"
The -k flag filters to kernel messages only. Use --since to narrow the time window.
Check syslog / kern.log
sudo grep -i "out of memory" /var/log/syslog sudo grep -i "oom_killer" /var/log/kern.log
dmesg timestamps are seconds since boot. Use dmesg -T to display human-readable timestamps.
Identifying which process was killed
The log records detailed information about the killed process.
sudo dmesg | grep "Killed process"
[1234567.890] Killed process 12345 (nginx) total-vm:512000kB, anon-rss:256000kB
Reading the output:
| Field | Meaning |
|---|---|
12345 |
Process PID |
(nginx) |
Process name |
total-vm |
Total virtual memory allocated |
anon-rss |
Anonymous pages actually in RAM |
file-rss |
File-backed pages in RAM |
oom_score_adj |
Kill priority adjustment (0 is default) |
To see memory state at the time of the event, print the context around the OOM message:
sudo dmesg | grep -A 30 "Out of memory" | head -50
How OOM scores work
The kernel assigns each process an oom_score (0–1000). Higher score means higher kill priority. The score is primarily based on memory usage relative to total physical RAM.
Check the current score
cat /proc/<PID>/oom_score
List all processes sorted by OOM score (highest first):
awk '{print $1}' /proc/*/status 2>/dev/null | \
xargs -I{} sh -c 'echo "$(cat /proc/{}/oom_score 2>/dev/null) {} $(cat /proc/{}/comm 2>/dev/null)"' | \
sort -rn | head -20Adjusting the score with oom_score_adj
Write a value from -1000 to 1000 to /proc/<PID>/oom_score_adj to bias the score.
| Value | Effect |
|---|---|
-1000 |
Completely excluded from OOM kill |
-500 |
Much harder to kill |
0 |
Default behavior |
500 |
Easier to kill |
1000 |
Killed first |
# Make nginx much harder to OOM-kill (temporary) echo -500 | sudo tee /proc/$(pgrep nginx)/oom_score_adj
A process set to -1000 can never be killed by the OOM killer. If that process has a memory leak, the system itself may crash. Restrict this to critical system services.
Immediate response — restart and reclaim memory
After the OOM killer fires, available memory should be restored. However, the root cause likely persists and the event will recur.
Restart the killed service
sudo systemctl restart <service-name>
Manually drop page cache
sync echo 3 | sudo tee /proc/sys/vm/drop_caches
Dropping caches forces subsequent I/O to read from disk, causing an I/O spike. Use with caution on production systems.
Find the top memory consumers
ps aux --sort=-%mem | head -20 free -h
Making oom_score_adj persistent
The oom_score_adj value resets when a process restarts. To persist it, add the setting to the systemd unit file.
sudo systemctl edit <service-name>
Add the following:
[Service]
OOMScoreAdjust=-500
Apply the change:
sudo systemctl daemon-reload sudo systemctl restart <service-name>
Verify:
cat /proc/$(pgrep <service>)/oom_score_adj
Adding swap to reduce memory pressure
Missing or insufficient swap makes OOM killer events more likely. Adding a swap file provides a buffer for sudden memory spikes.
Create a swap file
# Create a 2 GB swap file sudo fallocate -l 2G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile
Confirm it is active:
free -h swapon --show
total used free shared buff/cache available Mem: 3.8G 3.1G 100M 50M 600M 600M Swap: 2.0G 0B 2.0G
Persist across reboots
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Swap is not a permanent fix
Swap cushions memory spikes but does not fix the underlying cause. If memory usage grows continuously over time, investigate for a memory leak. Also consider tuning swappiness (default 60; 10–20 is common for servers).
Root cause — audit memory configuration
Recurring OOM killer events mean the system is genuinely running out of memory. Score tuning and swap are mitigations, not fixes.
Common root causes
- Memory leak — RSS grows over time without release
- Oversized memory config — JVM
-Xmx,innodb_buffer_pool_size, or similar exceeds physical RAM - Too many concurrent workers — Apache
MaxRequestWorkers, PHP-FPMpm.max_children, etc.
Monitor RSS growth over time
# Sample memory every 5 seconds watch -n 5 "ps -p <PID> -o pid,rss,vsz,comm"
Cap service memory with cgroup (via systemd)
Setting an explicit memory ceiling per service isolates failures and prevents one runaway process from taking down the system.
sudo systemctl edit <service-name>
[Service]
MemoryMax=512M
MemorySwapMax=0
When the limit is hit, only that service is killed — other processes are unaffected.