Understanding du vs df: Measuring Disk Usage Correctly

Understanding du vs df: Measuring Disk Usage Correctly

What You'll Learn

  • Explain the different roles of du and df
  • Understand why df shows full but du totals don't match
  • Solve the "I deleted it but space didn't return" mystery on your own
  • Master a systematic investigation pattern for low-disk incidents

Target Audience: Linux beginners, anyone using du and df without fully understanding them

Introduction: Lina's Disk Full Incident

Lina: Linny-senpai, big problem! The server's disk suddenly hit 100%. I checked the largest files with du, but the total doesn't even come close. What's going on?
Linny-senpai: Ah, classic "du vs df mismatch." So many people get tripped up by this.
Lina: Wait, aren't du and df basically the same thing?
Linny-senpai: They look similar, sure. But they're actually measuring completely different things. Let's walk through the difference and why the numbers diverge.

The Short Answer

  • df = free space per filesystem (mount point view)
  • du = usage per directory or file (path-based aggregation)
  • Mismatches mainly come from deleted-but-open files, mount boundaries, and root-reserved blocks

df - Free Space Per Filesystem

Linny-senpai: Let's start with df. It stands for "disk free" and reports total, used, and available space per filesystem — that is, per mount point.
Lina: Per filesystem...?
Linny-senpai: Right. Each mounted location (/, /home, a USB drive at /mnt/usb) is counted separately.

Try It Out

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   42G  5.5G  89% /
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda2       100G   60G   40G  61% /home

How to Read It

  • Filesystem: Device name (e.g., /dev/sda1)
  • Size: Total capacity
  • Used: Used space
  • Avail: Free space
  • Use%: Usage percentage (watch carefully above 90%)
  • Mounted on: Mount point
Lina: What does -h do?
Linny-senpai: Short for --human-readable. It shows sizes like 50G or 5.5G instead of raw kilobytes. Without -h, you get numbers like 52428800 — pretty painful to read.
Lina: Yeah, counting zeros gets old fast...

Common Options

$ df -h          # Human-readable units (G, M, K)
$ df -T          # Also show filesystem type (ext4, xfs, etc.)
$ df -i          # Show inode usage instead of byte usage
$ df -h /var     # Only the filesystem containing this path

Don't Forget df -i

When you see "No space left" despite plenty of byte space, inode exhaustion is often the cause (typically from too many tiny files). Always check both df -h and df -i.

du - Usage Per Directory

Linny-senpai: Next, du. It stands for "disk usage" and walks files under the given path to compute their total size.
Lina: Walks files — meaning it actually visits each one?
Linny-senpai: Yes. So du on a huge directory can take a while. That's the opposite of df, which returns instantly.

Try It Out

$ du -sh /var/log
1.2G    /var/log

Useful Option Combinations

  • -s (summary): Show only the total
  • -h (human-readable): Friendly units
  • --max-depth=N: Limit recursion to N levels

-sh is the workhorse combo — memorize it.

Find Heavy Subdirectories by Level

$ du -h --max-depth=1 /var
4.0K    /var/games
1.2G    /var/log
512M    /var/cache
24M     /var/lib
1.7G    /var
Lina: Oh, that's super useful!
Linny-senpai: Right? And if you sort by size on top of that, the biggest offenders pop out immediately.

Sort by Size

$ du -sh /var/* 2>/dev/null | sort -h
4.0K    /var/games
4.0K    /var/opt
24M     /var/lib
512M    /var/cache
1.2G    /var/log

Key Points

  • 2>/dev/null: Suppress permission-denied errors from unreadable subdirectories
  • sort -h: Sort correctly with human-readable units (treats 1.2G as larger than 512M)

Plain sort -n only looks at the leading number, so it'd put 1.2G before 512M. Use -h.

The Decisive Difference Between du and df

Lina: I'm starting to see it... so what exactly is the core difference?
Linny-senpai: In one phrase: they look at different layers of the same system. A side-by-side table makes it clear.

Comparison Table

Aspect df du
Unit Filesystem Directory / file
How it gets data From the superblock Walks files directly
Speed Instant Slow on large paths
Deleted-but-open files Included Not included
Other mounts Counted separately Crosses by default (block with -x)
Root-reserved blocks Affects Avail No effect
Lina: Got it — so when df says "90% full" but du only finds 60% worth of files, that's exactly this gap?
Linny-senpai: Exactly. Let's look at the three main causes one by one.

Culprit #1: Deleted-But-Open Files

Linny-senpai: The number one cause of df vs du mismatch: "files that were deleted but are still held open by a process."
Lina: Deleted but still held open...? How is that even possible?
Linny-senpai: On Linux, rm doesn't truly remove a file while a process still has it open. The data sticks around until the last file descriptor closes. Imagine someone runs rm /var/log/access.log while nginx is still writing to it.
Lina: So the log appears to be gone, but it's secretly still eating disk?
Linny-senpai: Exactly. du can't find it because there's no longer a filename to walk. But df sees the blocks as still allocated. That's where the gap comes from.

Find Deleted-But-Open Files

$ sudo lsof | grep deleted
nginx     1234  root  5w  REG  8,1  524288000  ... /var/log/nginx/access.log (deleted)
mysqld    5678  mysql 7w  REG  8,1  104857600  ... /tmp/ibdata.tmp (deleted)

How to Read It

  • Column 1: Process name (nginx, mysqld)
  • Column 2: PID (process ID)
  • Column 7: Size in bytes
  • Trailing (deleted): Flag for files that are unlinked but still open

Release the Space

# Restart or reload the holding process
$ sudo systemctl restart nginx
$ sudo systemctl reload mysql

# Advanced: redirect the open fd without restarting
# (e.g., truncating /proc/<PID>/fd/<N> to /dev/null — expert territory)

Check Impact Before Restarting

Before systemctl restart on production, confirm the downtime window and any dependent services. See No space left on device for the full incident playbook.

Culprit #2: Mount Boundaries

Lina: What are the other two causes?
Linny-senpai: One is mount boundaries. Picture /home mounted on its own partition. What happens when you run du -sh /?
Lina: Hmm... it would include /home in the total?
Linny-senpai: Right — by default, du crosses mount points. But df counts each filesystem separately. So du -sh / can come out larger than df's number for /.
# Stay within one filesystem (matches df scope)
$ sudo du -sh -x /

-x (--one-file-system)

This tells du to only aggregate files on the same filesystem as the starting path. Makes the result directly comparable to df.

Culprit #3: Root-Reserved Blocks

Linny-senpai: Last one is subtle but important. Filesystems like ext4 reserve 5% of capacity for the root user by default.
Lina: Why reserve any?
Linny-senpai: So root can still operate (e.g., to remove files, run repairs) even when regular users fill the disk. df's Avail column subtracts that reserved block. That's why Size - Used doesn't always equal Avail.
# Check the reserved block count
$ sudo tune2fs -l /dev/sda1 | grep -i reserved
Reserved block count:     655360
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)

Lower the Reservation Carefully

You can drop it with tune2fs -m 1 /dev/sda1, but keep the reservation on the root filesystem. Only consider lowering it on dedicated data partitions.

The Practical Investigation Pattern

Lina: I understand the differences now. So when someone says "the disk is full," what do I actually do?
Linny-senpai: Good question. Having a fixed playbook keeps you from panicking and breaking things.

Disk Investigation Playbook (top to bottom)

  1. Get the big picture: df -h (which filesystem is full?)
  2. Check inodes too: df -i (small-files-exhaustion case)
  3. Find heavy directories: sudo du -h --max-depth=1 / 2>/dev/null | sort -h
  4. If df and du disagree: sudo lsof | grep deleted
  5. Trim old logs: Look under /var/log for rotated .gz files
  6. Won't release?: Restart the holding service with systemctl restart

Commands for Each Step

# 1. Big picture
df -h

# 2. Inode check
df -i

# 3. Drill into heavy directories (one level at a time)
sudo du -h --max-depth=1 / 2>/dev/null | sort -h
sudo du -h --max-depth=1 /var 2>/dev/null | sort -h
sudo du -h --max-depth=1 /var/log 2>/dev/null | sort -h

# 4. Open deleted files (sorted by size, biggest first)
sudo lsof | grep deleted | sort -k7 -n -r | head

# 5. Search by individual large files
sudo find / -type f -size +100M 2>/dev/null

Mini Exercises: Try It on Your Box

Linny-senpai: To lock the knowledge in, run these on your own system.

Exercise 1: Show the usage of your / partition.

Exercise 2: Find the Top 3 largest directories directly under your home.

Exercise 3: Compare df -h with sudo du -sh -x /, then explain the gap in one sentence.

Hint for Exercise 1
df -h /

The Use% column shows the usage. Anything above 90% deserves cleanup.

Hint for Exercise 2
du -h --max-depth=1 ~ 2>/dev/null | sort -h | tail -3

tail -3 grabs the three largest entries.

Hint for Exercise 3

Common reasons the numbers differ:

  • Deleted-but-still-open files (counted only by df)
  • Mount boundaries (unless du -x is used)
  • ext4 root-reserved blocks (affects df's Avail)

Common Pitfalls

Three Patterns to Avoid

  1. Running du -sh / over SSH without nohup → the walk dies if your session disconnects mid-traversal
  2. Deleting files based only on df → useless when the cause is open-deleted files
  3. rm -rf /tmp/* as a blanket sweep → corrupts work files of running applications

Safe Habits

  • Use sudo du -sh /* to drill from the top level, not blast through the entire tree at once
  • Always run ls -lh first to double-check the size and timestamp before deleting
  • For large logs, prefer truncate -s 0 logfile over rm — it empties the file in place, even while a process holds it open

Next Reading