How to Troubleshoot Disk I/O on Linux - iostat and vmstat Guide

2025-12-16 Reading time: About 10 min Difficulty: Intermediate

What You'll Learn

How to determine if disk I/O is the bottleneck when the server is slow
How to use iostat / vmstat to distinguish CPU wait vs disk congestion
How to escape the "don't know what's happening" state

Quick Summary

When suspecting disk I/O issues:

CPU idle but slow? -> iostat -xz 1
High I/O wait? -> Check %iowait
Write congestion? -> Check await / %util
Confirm disk is the cause? -> vmstat 1

Prerequisites

OS: Ubuntu
Target: Server beginners
Goal: Isolation and diagnosis

1. What is "Slow Disk I/O"?

Conclusion: Slow disk I/O means %iowait rises even when CPU utilization looks low.

"Slow disk" usually means one of these:

Application waiting for disk response
Log writes getting backed up
DB / Docker / backups consuming I/O
CPU is idle but stuck waiting on I/O

Looking at CPU% alone will always mislead you.

2. Installing iostat

Conclusion: Install sysstat via apt to get iostat — it is absent by default on Ubuntu.

$ sudo apt update
$ sudo apt install -y sysstat

3. Using iostat -xz

Conclusion: Run iostat -xz 1 for per-second stats — -x reveals the key I/O metrics.

$ iostat -xz 1

Options:

-x: Extended stats (essential)
-z: Hide zero rows (cleaner output)
1: Update every 1 second

4. Key Metrics (Core Knowledge)

Conclusion: Focus on %iowait (CPU wait), %util (saturation), await (latency) together.

4-1. %iowait (CPU side)

10%+: Notable I/O wait
20%+: Disk I/O is almost certainly the bottleneck

CPU wants to work but is stuck waiting on disk.

4-2. %util (Disk side)

70% or less: Plenty of headroom
80-90%: Getting congested
100% constant: Completely saturated

4-3. await (Wait time)

Few ms: Normal
10ms+: Slow
50ms+: Noticeably slow
100ms+: Critical

4-4. r/s w/s (Read/Write rate)

Tells you whether reads or writes are the problem. Log bloat and DB writes spike w/s.

5. Common Patterns

Conclusion: Three patterns: iowait+util high, CPU+I/O combined, or await high with low util.

Pattern A: CPU low but slow

%iowait high
%util high

Classic disk I/O bottleneck

Pattern B: Both CPU and I/O high

%user / %system high
%iowait also high

Heavy processing + heavy disk writes combined

Pattern C: %util low but still slow

%util low
await high

Storage itself is slow (network storage, EBS, etc.)

6. Using vmstat

Conclusion: Use vmstat 1 — high b with low r column confirms I/O is the bottleneck.

$ vmstat 1

Key columns:

b: Processes waiting on I/O (high = trouble)
wa: I/O wait (10%+ = warning)

If r is low but b is high, I/O is the cause, not CPU.

7. Confirming Disk is the Culprit

Conclusion: Disk is confirmed when %iowait, %util, await, and vmstat b are elevated.

If ALL of these are true, disk is almost certainly the cause:

%iowait is high
%util is high
await is spiking
vmstat b column is increasing

8. Common Causes

Conclusion: Docker, databases, log bloat, and backups are the most common disk I/O culprits.

Docker (logs, layers, overlayfs)
Databases (MySQL / PostgreSQL)
Log bloat (access.log, error.log)
Backups (rsync, tar)
Heavy cron jobs

9. What NOT to Do

Conclusion: Three mistakes: CPU-only judgment, rebooting early, and trusting %util alone.

NG1: Judging "All Clear" by CPU Alone

A classic accident from ignoring I/O wait.

NG2: Rebooting Immediately

The I/O evidence disappears. Observe first.

NG3: Trusting %util Alone

await can be dying even when %util looks fine.

Copy-Paste Template

# Disk details (most important)
iostat -xz 1

# Overall view
vmstat 1

# Disk list
lsblk
df -h

Summary

"Slow" does not always mean CPU
%iowait and %util are the decision axes
iostat and vmstat go together
Follow the order: observe → identify cause → act