How to Troubleshoot Disk I/O on Linux - iostat and vmstat Guide
What You'll Learn
- How to determine if disk I/O is the bottleneck when the server is slow
- How to use
iostat/vmstatto distinguish CPU wait vs disk congestion - How to escape the "don't know what's happening" state
Quick Summary
When suspecting disk I/O issues:
- CPU idle but slow? ->
iostat -xz 1 - High I/O wait? -> Check
%iowait - Write congestion? -> Check
await / %util - Confirm disk is the cause? ->
vmstat 1
Prerequisites
- OS: Ubuntu
- Target: Server beginners
- Goal: Isolation and diagnosis
1. What is "Slow Disk I/O"?
Conclusion: Slow disk I/O means %iowait rises even when CPU utilization looks low.
"Slow disk" usually means one of these:
- Application waiting for disk response
- Log writes getting backed up
- DB / Docker / backups consuming I/O
- CPU is idle but stuck waiting on I/O
Looking at CPU% alone will always mislead you.
2. Installing iostat
Conclusion: Install
sysstatviaaptto getiostat— it is absent by default on Ubuntu.
$ sudo apt update $ sudo apt install -y sysstat
3. Using iostat -xz
Conclusion: Run
iostat -xz 1for per-second stats —-xreveals the key I/O metrics.
$ iostat -xz 1
Options:
-x: Extended stats (essential)-z: Hide zero rows (cleaner output)1: Update every 1 second
4. Key Metrics (Core Knowledge)
Conclusion: Focus on
%iowait(CPU wait),%util(saturation),await(latency) together.
4-1. %iowait (CPU side)
- 10%+: Notable I/O wait
- 20%+: Disk I/O is almost certainly the bottleneck
CPU wants to work but is stuck waiting on disk.
4-2. %util (Disk side)
- 70% or less: Plenty of headroom
- 80-90%: Getting congested
- 100% constant: Completely saturated
4-3. await (Wait time)
- Few ms: Normal
- 10ms+: Slow
- 50ms+: Noticeably slow
- 100ms+: Critical
4-4. r/s w/s (Read/Write rate)
Tells you whether reads or writes are the problem. Log bloat and DB writes spike w/s.
5. Common Patterns
Conclusion: Three patterns: iowait+util high, CPU+I/O combined, or await high with low util.
Pattern A: CPU low but slow
%iowaithigh%utilhigh
Classic disk I/O bottleneck
Pattern B: Both CPU and I/O high
%user/%systemhigh%iowaitalso high
Heavy processing + heavy disk writes combined
Pattern C: %util low but still slow
%utillowawaithigh
Storage itself is slow (network storage, EBS, etc.)
6. Using vmstat
Conclusion: Use
vmstat 1— highbwith lowrcolumn confirms I/O is the bottleneck.
$ vmstat 1
Key columns:
b: Processes waiting on I/O (high = trouble)wa: I/O wait (10%+ = warning)
If r is low but b is high, I/O is the cause, not CPU.
7. Confirming Disk is the Culprit
Conclusion: Disk is confirmed when
%iowait,%util,await, and vmstatbare elevated.
If ALL of these are true, disk is almost certainly the cause:
%iowaitis high%utilis highawaitis spikingvmstatbcolumn is increasing
8. Common Causes
Conclusion: Docker, databases, log bloat, and backups are the most common disk I/O culprits.
- Docker (logs, layers, overlayfs)
- Databases (MySQL / PostgreSQL)
- Log bloat (access.log, error.log)
- Backups (rsync, tar)
- Heavy cron jobs
9. What NOT to Do
Conclusion: Three mistakes: CPU-only judgment, rebooting early, and trusting %util alone.
NG1: Judging "All Clear" by CPU Alone
A classic accident from ignoring I/O wait.
NG2: Rebooting Immediately
The I/O evidence disappears. Observe first.
NG3: Trusting %util Alone
await can be dying even when %util looks fine.
Copy-Paste Template
# Disk details (most important) iostat -xz 1 # Overall view vmstat 1 # Disk list lsblk df -h
Summary
- "Slow" does not always mean CPU
%iowaitand%utilare the decision axesiostatandvmstatgo together- Follow the order: observe → identify cause → act