Repairing Filesystem Corruption with fsck: A Safe Procedure

2026-06-06 Reading time: About 12 min Difficulty: Intermediate

What You'll Learn

How to run fsck without destroying data
How to check and repair the mounted root (/) filesystem
When to use -n / -y / -f, and how ext4 differs from XFS

The one rule that matters most: never run fsck on a mounted filesystem (especially read-write). It rewrites metadata that the kernel is actively using and can corrupt an otherwise healthy filesystem. Always unmount first, or diagnose read-only.

Assumptions

Distro: Ubuntu / Debian family (commands are nearly identical on RHEL family)
Filesystem: ext4 is the main target (XFS / Btrfs use their own tools, covered below)
You have root (sudo)

What is fsck and when do you use it?

Conclusion: fsck checks and repairs filesystem integrity. Reach for it when metadata corruption is likely - Input/output error, Read-only file system, or a failed fsck at boot.

fsck (file system check) is a front-end that dispatches to a per-filesystem checker (fsck.<type>, e.g. e2fsck for the ext family). It verifies the superblock, inode tables, directory structure, and block bitmaps, then repairs inconsistencies. XFS is an exception, though: fsck.xfs is a stub that does nothing, so you check and repair XFS with xfs_repair directly.

Typical situations where you need it:

File operations repeatedly fail with Input/output error
The filesystem suddenly flipped to Read-only file system (the kernel detected corruption and protected it)
Boot stopped at "fsck failed" or "Give root password for maintenance"
After an unclean shutdown or power loss, you want to verify integrity

Journaling filesystems (ext4 / XFS) recover most minor inconsistencies automatically at mount time. A manual fsck is for structural damage the journal replay can't fix, or for hardware-induced corruption.

Why must you never run fsck on a mounted filesystem?

Conclusion: A mounted FS has the kernel caching and updating metadata. If fsck writes through a separate path, the two views diverge and the "repair" corrupts live structures instead of fixing them.

On a live filesystem, the kernel updates inodes and bitmaps in the buffer cache. fsck reads and writes the block device directly. If fsck writes a "fix" while the kernel's view differs, it destroys structures that were still in use.

That is why repair-mode fsck refuses to run (or warns) when the target is mounted. There are three safe paths:

Unmount, then run (for data partitions)
Diagnose read-only with -n (writes nothing - safer, but repairs nothing)
For the root FS, use rescue mode / boot-time fsck / a live USB (below)

# First, confirm the target is not mounted
$ findmnt /dev/sdb1
$ lsblk -f

How do you safely check and repair a data partition?

Conclusion: Unmount, diagnose with -n, then repair with -fy if needed. If hardware failure is likely, image the device with ddrescue before repairing.

1. Unmount it

$ sudo umount /dev/sdb1

If it won't unmount with target is busy, find what's holding it.

$ sudo fuser -vm /dev/sdb1
$ sudo lsof /dev/sdb1

2. Diagnose read-only first (`-n`)

-n answers "no" to every question and writes nothing. Use it to assess the damage.

$ sudo fsck -n /dev/sdb1

3. Image the device if hardware failure is suspected

When a physical fault is likely (frequent Input/output error, below), rescue the sectors with ddrescue before a repair can make things worse.

$ sudo ddrescue /dev/sdb1 /mnt/backup/sdb1.img /mnt/backup/sdb1.log

4. Force-check and auto-repair (`-fy`)

$ sudo fsck -fy /dev/sdb1

-f: force a full check even if the FS is flagged "clean"
-y: answer "yes" to every repair prompt (run unattended to completion)

-y skips the prompts but hands every judgment call to fsck. For important data, read the -n output first; if the damage is localized, consider running fsck with no flag so you can answer each prompt yourself.

How do you check the mounted root (/) filesystem?

Conclusion: You can't unmount a live /. Use one of: schedule an automatic fsck at next boot, boot with fsck.mode=force, or check from a live USB.

The root filesystem is in use, so you normally can't unmount it. Three workarounds:

Option A: Schedule a forced fsck at next boot

On systemd systems you can request a boot-time check with a flag file.

# Ubuntu/Debian: force a one-time check on next boot
$ sudo touch /forcefsck
$ sudo reboot

systemd-fsck reads /forcefsck and runs the check early in boot (while root is still read-only), then deletes the flag automatically.

Option B: Pass a kernel parameter from GRUB

At the GRUB menu press e, then append to the end of the linux line:

fsck.mode=force fsck.repair=yes

fsck.repair=yes is like -y (unattended repair). To stay conservative, use fsck.repair=preen (like -p, only safe automatic fixes).

Option C: Check from a live USB / rescue media

If root is too damaged to boot, start a live USB and check it without mounting.

# In the live environment (never mount the target)
$ sudo fsck -fy /dev/sda2

If the system drops into emergency mode, / is usually mounted read-only. Running fsck is relatively safe when it's read-only, but to be sure, make it explicit with mount -o remount,ro / before you run it.

How do the main fsck options differ?

Conclusion: Diagnose with -n, repair unattended with -y or -p, and force a full check with -f. Combine them to fit the situation.

Option	Meaning	When to use
`-n`	Write nothing, answer no to all	Assess damage first (read-only diagnosis)
`-y`	Answer yes to all (unattended repair)	No interaction / at boot
`-p`	Preen. Auto-fix only what's safe	Standard mode at boot
`-f`	Force a full check even if clean	Suspect damage hidden behind the journal
`-c`	Scan for bad blocks (badblocks, ext only)	Suspect physical media wear
`-A`	Check all filesystems in `/etc/fstab` in order	Used internally by the boot sequence

# For ext, e2fsck is called directly. To be explicit:
$ sudo e2fsck -fy /dev/sdb1

# If the superblock is damaged, use a backup superblock
$ sudo dumpe2fs /dev/sdb1 | grep -i superblock
$ sudo e2fsck -b 32768 /dev/sdb1

Do not combine -p (preen) and -y. Preen is designed to auto-fix only safe issues and to stop with an error code when a problem needs human judgment. Mixing in -y conflicts with that intent.

What about non-ext4 filesystems (XFS / Btrfs)?

Conclusion: fsck.xfs does nothing - XFS uses xfs_repair. Btrfs uses btrfs check. Pick the right tool per filesystem or you aren't actually repairing anything.

Check the filesystem type with lsblk -f or blkid.

$ lsblk -f /dev/sdb1
$ sudo blkid /dev/sdb1

For XFS

XFS has no traditional fsck. /sbin/fsck.xfs exists but does nothing (a stub so boot isn't blocked). The real repair tool is xfs_repair.

# Always unmount first
$ sudo umount /dev/sdb1
# Dry run first (-n changes nothing)
$ sudo xfs_repair -n /dev/sdb1
# Repair
$ sudo xfs_repair /dev/sdb1

Only when a damaged log makes xfs_repair refuse to run should you use -L (zero the log) as a last resort. It risks data loss, so don't reach for it casually.

For Btrfs

$ sudo umount /dev/sdb1
$ sudo btrfs check /dev/sdb1          # diagnose
$ sudo btrfs check --repair /dev/sdb1 # repair (officially a last resort)

Both xfs_repair -L and btrfs check --repair are last-resort operations that can lose data. Image the device with ddrescue before running them.

What should you verify after a repair?

Conclusion: Read the exit code, inspect files rescued into lost+found, remount and test read/write. If it recurs, suspect the disk with smartctl.

1. Read the exit code

fsck exit codes are a bitmask.

$ sudo fsck -fy /dev/sdb1; echo "exit=$?"

Code	Meaning
0	No errors
1	Errors corrected (fine)
2	Corrected, but a reboot is needed
4	Errors left uncorrected
8	Operational error

0 or 1 means a clean finish. If 4 or higher remains, re-run or image the device.

2. Inspect lost+found

Inodes that lost their parent directory are recovered into lost+found/ at the FS root, named by inode number.

$ sudo mount /dev/sdb1 /mnt
$ sudo ls -l /mnt/lost+found/
$ sudo file /mnt/lost+found/*

3. Remount and test read/write

$ sudo mount /dev/sdb1 /mnt
$ touch /mnt/.write-test && rm /mnt/.write-test && echo "write OK"

4. If it recurs, suspect the hardware

$ sudo smartctl -a /dev/sdb | grep -iE 'reallocated|pending|uncorrect'
$ sudo dmesg | grep -iE 'I/O error|ata|medium error'

If Reallocated_Sector_Ct or Current_Pending_Sector is climbing, it's a physical fault. Prefer replacing the disk and restoring from backup over keeping it alive with fsck.

What not to do (checklist)

Conclusion: Repairing while mounted, mistaking the FS type, an unverified -y, and -L without imaging are the classic ways to make corruption worse.

Failure patterns

Running fsck (read-write) directly on a mounted /dev/...
Running fsck (i.e. fsck.xfs) on XFS and assuming it "fixed" anything
Jumping straight to -y without reading the -n output
Running xfs_repair -L without imaging when a physical fault is likely
Giving up on a damaged superblock without trying a backup superblock