Repairing Filesystem Corruption with fsck: A Safe Procedure

Repairing Filesystem Corruption with fsck: A Safe Procedure

What You'll Learn

  • How to run fsck without destroying data
  • How to check and repair the mounted root (/) filesystem
  • When to use -n / -y / -f, and how ext4 differs from XFS

Assumptions

  • Distro: Ubuntu / Debian family (commands are nearly identical on RHEL family)
  • Filesystem: ext4 is the main target (XFS / Btrfs use their own tools, covered below)
  • You have root (sudo)

What is fsck and when do you use it?

Conclusion: fsck checks and repairs filesystem integrity. Reach for it when metadata corruption is likely - Input/output error, Read-only file system, or a failed fsck at boot.

fsck (file system check) is a front-end that dispatches to a per-filesystem checker (e2fsck for the ext family, xfs_repair for XFS). It verifies the superblock, inode tables, directory structure, and block bitmaps, then repairs inconsistencies.

Typical situations where you need it:

  • File operations repeatedly fail with Input/output error
  • The filesystem suddenly flipped to Read-only file system (the kernel detected corruption and protected it)
  • Boot stopped at "fsck failed" or "Give root password for maintenance"
  • After an unclean shutdown or power loss, you want to verify integrity

Journaling filesystems (ext4 / XFS) recover most minor inconsistencies automatically at mount time. A manual fsck is for structural damage the journal replay can't fix, or for hardware-induced corruption.

Why must you never run fsck on a mounted filesystem?

Conclusion: A mounted FS has the kernel caching and updating metadata. If fsck writes through a separate path, the two views diverge and the "repair" corrupts live structures instead of fixing them.

On a live filesystem, the kernel updates inodes and bitmaps in the buffer cache. fsck reads and writes the block device directly. If fsck writes a "fix" while the kernel's view differs, it destroys structures that were still in use.

That is why repair-mode fsck refuses to run (or warns) when the target is mounted. There are three safe paths:

  1. Unmount, then run (for data partitions)
  2. Diagnose read-only with -n (writes nothing - safer, but repairs nothing)
  3. For the root FS, use rescue mode / boot-time fsck / a live USB (below)
# First, confirm the target is not mounted
$ findmnt /dev/sdb1
$ lsblk -f

How do you safely check and repair a data partition?

Conclusion: Unmount, diagnose with -n, then repair with -fy if needed. If hardware failure is likely, image the device with ddrescue before repairing.

1. Unmount it

$ sudo umount /dev/sdb1

If it won't unmount with target is busy, find what's holding it.

$ sudo fuser -vm /dev/sdb1
$ sudo lsof /dev/sdb1

2. Diagnose read-only first (-n)

-n answers "no" to every question and writes nothing. Use it to assess the damage.

$ sudo fsck -n /dev/sdb1

3. Image the device if hardware failure is suspected

When a physical fault is likely (frequent Input/output error, below), rescue the sectors with ddrescue before a repair can make things worse.

$ sudo ddrescue /dev/sdb1 /mnt/backup/sdb1.img /mnt/backup/sdb1.log

4. Force-check and auto-repair (-fy)

$ sudo fsck -fy /dev/sdb1
  • -f: force a full check even if the FS is flagged "clean"
  • -y: answer "yes" to every repair prompt (run unattended to completion)

-y skips the prompts but hands every judgment call to fsck. For important data, read the -n output first; if the damage is localized, consider running fsck with no flag so you can answer each prompt yourself.

How do you check the mounted root (/) filesystem?

Conclusion: You can't unmount a live /. Use one of: schedule an automatic fsck at next boot, boot with fsck.mode=force, or check from a live USB.

The root filesystem is in use, so you normally can't unmount it. Three workarounds:

Option A: Schedule a forced fsck at next boot

On systemd systems you can request a boot-time check with a flag file.

# Ubuntu/Debian: force a one-time check on next boot
$ sudo touch /forcefsck
$ sudo reboot

systemd-fsck reads /forcefsck and runs the check early in boot (while root is still read-only), then deletes the flag automatically.

Option B: Pass a kernel parameter from GRUB

At the GRUB menu press e, then append to the end of the linux line:

fsck.mode=force fsck.repair=yes

fsck.repair=yes is like -y (unattended repair). To stay conservative, use fsck.repair=preen (like -p, only safe automatic fixes).

Option C: Check from a live USB / rescue media

If root is too damaged to boot, start a live USB and check it without mounting.

# In the live environment (never mount the target)
$ sudo fsck -fy /dev/sda2

If the system drops into emergency mode, / is usually mounted read-only. Running fsck is relatively safe when it's read-only, but to be sure, make it explicit with mount -o remount,ro / before you run it.

How do the main fsck options differ?

Conclusion: Diagnose with -n, repair unattended with -y or -p, and force a full check with -f. Combine them to fit the situation.

Option Meaning When to use
-n Write nothing, answer no to all Assess damage first (read-only diagnosis)
-y Answer yes to all (unattended repair) No interaction / at boot
-p Preen. Auto-fix only what's safe Standard mode at boot
-f Force a full check even if clean Suspect damage hidden behind the journal
-c Scan for bad blocks (badblocks, ext only) Suspect physical media wear
-A Check all filesystems in /etc/fstab in order Used internally by the boot sequence
# For ext, e2fsck is called directly. To be explicit:
$ sudo e2fsck -fy /dev/sdb1

# If the superblock is damaged, use a backup superblock
$ sudo dumpe2fs /dev/sdb1 | grep -i superblock
$ sudo e2fsck -b 32768 /dev/sdb1

Do not combine -p (preen) and -y. Preen is designed to auto-fix only safe issues and to stop with an error code when a problem needs human judgment. Mixing in -y conflicts with that intent.

What about non-ext4 filesystems (XFS / Btrfs)?

Conclusion: fsck.xfs does nothing - XFS uses xfs_repair. Btrfs uses btrfs check. Pick the right tool per filesystem or you aren't actually repairing anything.

Check the filesystem type with lsblk -f or blkid.

$ lsblk -f /dev/sdb1
$ sudo blkid /dev/sdb1

For XFS

XFS has no traditional fsck. /sbin/fsck.xfs exists but does nothing (a stub so boot isn't blocked). The real repair tool is xfs_repair.

# Always unmount first
$ sudo umount /dev/sdb1
# Dry run first (-n changes nothing)
$ sudo xfs_repair -n /dev/sdb1
# Repair
$ sudo xfs_repair /dev/sdb1

Only when a damaged log makes xfs_repair refuse to run should you use -L (zero the log) as a last resort. It risks data loss, so don't reach for it casually.

For Btrfs

$ sudo umount /dev/sdb1
$ sudo btrfs check /dev/sdb1          # diagnose
$ sudo btrfs check --repair /dev/sdb1 # repair (officially a last resort)

What should you verify after a repair?

Conclusion: Read the exit code, inspect files rescued into lost+found, remount and test read/write. If it recurs, suspect the disk with smartctl.

1. Read the exit code

fsck exit codes are a bitmask.

$ sudo fsck -fy /dev/sdb1; echo "exit=$?"
Code Meaning
0 No errors
1 Errors corrected (fine)
2 Corrected, but a reboot is needed
4 Errors left uncorrected
8 Operational error

0 or 1 means a clean finish. If 4 or higher remains, re-run or image the device.

2. Inspect lost+found

Inodes that lost their parent directory are recovered into lost+found/ at the FS root, named by inode number.

$ sudo mount /dev/sdb1 /mnt
$ sudo ls -l /mnt/lost+found/
$ sudo file /mnt/lost+found/*

3. Remount and test read/write

$ sudo mount /dev/sdb1 /mnt
$ touch /mnt/.write-test && rm /mnt/.write-test && echo "write OK"

4. If it recurs, suspect the hardware

$ sudo smartctl -a /dev/sdb | grep -iE 'reallocated|pending|uncorrect'
$ sudo dmesg | grep -iE 'I/O error|ata|medium error'

If Reallocated_Sector_Ct or Current_Pending_Sector is climbing, it's a physical fault. Prefer replacing the disk and restoring from backup over keeping it alive with fsck.

What not to do (checklist)

Conclusion: Repairing while mounted, mistaking the FS type, an unverified -y, and -L without imaging are the classic ways to make corruption worse.

Failure patterns

  • Running fsck (read-write) directly on a mounted /dev/...
  • Running fsck (i.e. fsck.xfs) on XFS and assuming it "fixed" anything
  • Jumping straight to -y without reading the -n output
  • Running xfs_repair -L without imaging when a physical fault is likely
  • Giving up on a damaged superblock without trying a backup superblock

Next Reading