vmstat, iostat, and sar - Understanding Linux Performance Analysis Tools

vmstat, iostat, and sar - Understanding Linux Performance Analysis Tools

What You Will Learn

  • What each tool does and how to read its output
  • How to identify CPU, memory, I/O, and network bottlenecks
  • When to reach for vmstat vs iostat vs sar

Quick Summary: Role of Each Tool

Tool Best For
vmstat System-wide overview — CPU, memory, swap, I/O at a glance
iostat Per-device I/O details — await, IOPS, utilization
sar Historical data — trend analysis and post-incident review

Prerequisites

  • OS: Ubuntu or RHEL-based Linux
  • iostat and sar require the sysstat package
  • Install with: sudo apt install sysstat

What Is vmstat?

vmstat (Virtual Memory Statistics) displays CPU, memory, swap, I/O, and process counts in a single output. It is the first tool to reach for when you need a quick system-wide picture to narrow down which subsystem is under pressure.

Basic Usage

vmstat [interval [count]]
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 1543200  52480 921600    0    0    14    38  312  580 12  3 84  1  0
 0  0      0 1540800  52480 922100    0    0     0    12  280  510  5  1 93  1  0
 0  0      0 1539900  52480 922500    0    0     0    16  295  530  4  1 94  1  0

Reading the Fields

procs

  • r: Processes waiting to run. If this consistently exceeds your CPU count, you have CPU saturation.
  • b: Processes blocked in uninterruptible sleep (typically waiting for I/O).

memory (in KB)

  • swpd: Swap in use. Any non-zero value warrants attention.
  • free: Unused memory.
  • buff / cache: Buffer and page cache — memory the OS holds for re-use, not wasted.

swap

  • si: Swap-in rate (disk → memory). Consistently non-zero means memory pressure.
  • so: Swap-out rate (memory → disk). Consistently non-zero means memory pressure.

io (blocks/sec)

  • bi: Blocks read from block devices.
  • bo: Blocks written to block devices.

cpu (%)

  • us: User-space CPU usage.
  • sy: Kernel-space CPU usage.
  • id: Idle time.
  • wa: Time waiting for I/O. Sustained values above 10% indicate an I/O bottleneck.
  • st: Time stolen by the hypervisor from this VM (virtualized environments only).

The first row is cumulative

The first line of vmstat output is the average since boot and is useful only as a baseline. Diagnose from the second row onward. Common patterns: vmstat 1 for continuous monitoring, vmstat 5 12 for a one-minute snapshot at five-second intervals.

What Is iostat?

iostat shows CPU summary statistics alongside per-device I/O metrics. After vmstat raises suspicion of an I/O bottleneck, iostat -x pinpoints which device is the culprit.

Installation (first time only)

$ sudo apt install sysstat   # Ubuntu/Debian
$ sudo dnf install sysstat   # RHEL/Fedora

Basic Usage

$ iostat -x 1 5
Device            r/s     w/s   rkB/s   wkB/s  await  r_await  w_await  util%
sda              1.20    5.30   48.00  212.00   2.50     1.80     2.70   3.20
nvme0n1         25.00   80.00  800.00 3200.00   0.48     0.40     0.52   8.50

Key Fields

Field Description Warning Threshold
await Average response time per request (ms) HDD: >20ms / SSD: >1ms
r_await Read response time (ms)
w_await Write response time (ms)
util% Device busy percentage Concern: >80%, Saturation: 100%
r/s, w/s Read/write operations per second (IOPS) Compare against device spec

When util% approaches 100%, the device is saturated — the I/O queue is building up and response times inflate. A spike in await alongside high util% confirms the device is the bottleneck. Note that util% is a device-level metric, not partition-level.

Filter to Specific Devices

$ iostat -x -d sda nvme0n1 1

What Is sar?

sar (System Activity Reporter) continuously collects CPU, memory, I/O, and network metrics and stores them as historical data. Its main value is answering questions like "what happened at 3 AM last night?" that real-time tools cannot.

Enabling sar

$ sudo apt install sysstat
$ sudo systemctl enable sysstat --now

Once enabled, the sadc daemon writes records to /var/log/sa/saDD (where DD is the two-digit date). Data accumulates daily.

Common Options

$ sar -u 1 5          # CPU utilization
$ sar -r 1 5          # Memory usage
$ sar -b 1 5          # I/O statistics
$ sar -n DEV 1 5      # Network stats per interface
$ sar -n EDEV 1 5     # Network error statistics
$ sar -q 1 5          # Load average and process counts

Reviewing Historical Data

# Today's CPU stats (all recorded intervals)
$ sar -u

# A full day of all metrics (-A flag)
$ sar -A -f /var/log/sa/sa01
00:00:01    all      2.34      0.00      5.67      0.12      0.00     91.87
01:00:01    all      1.23      0.00      3.45      0.08      0.00     95.24
02:00:01    all      0.98      0.00      2.11      0.05      0.00     96.86

Changing the collection interval

The default collection interval is 10 minutes, configured in /etc/cron.d/sysstat or /etc/sysstat/sysstat. For production environments where you need post-incident precision, change it to 1–2 minutes. Keep in mind that shorter intervals increase disk usage proportionally.

Which Tool Should You Use?

Matching the right tool to your investigation question cuts diagnosis time significantly.

Question Tool Key Fields
What is the overall system state? vmstat 1 r, wa, si/so
Is CPU the bottleneck? vmstat / sar -u r, us+sy, id
Which disk is slow? iostat -x 1 await, util%
Is the system swapping? vmstat si, so, swpd
What happened last night? sar -u / -r / -b Time-series view
Is the network saturated? sar -n DEV rxkB/s, txkB/s

Practical Bottleneck Detection Patterns

Four common diagnostic sequences used in production environments.

1. Suspected CPU Bottleneck

# Step 1: System overview
$ vmstat 1 10
# Watch if r (run queue) consistently exceeds CPU count

# Step 2: Confirm with sar
$ sar -u 1 10
# us + sy sustained above 90% → CPU is the bottleneck

2. Suspected Memory or Swap Pressure

# Step 1: Check swap activity
$ vmstat 1 10
# si/so consistently non-zero → active swapping (memory pressure)

# Step 2: Memory detail
$ sar -r 1 5
# High %memused + increasing kbswpused → needs action

3. Suspected I/O Bottleneck

# Step 1: vmstat to confirm wa
$ vmstat 1 5
# wa above 10% → I/O bottleneck suspected

# Step 2: iostat to identify the device
$ iostat -x 1 10
# Focus on devices with high await or util% above 80%
# HDD warning: await >20ms / SSD warning: await >1ms

4. Suspected Network Issue

$ sar -n DEV 1 5
# rxkB/s / txkB/s: check bandwidth utilization
# rxerr/s / txerr/s non-zero: possible hardware or driver issue

Next Reading