Repairing Filesystem Corruption with fsck: A Safe Procedure
What You'll Learn
- How to run
fsckwithout destroying data - How to check and repair the mounted root (
/) filesystem - When to use
-n/-y/-f, and how ext4 differs from XFS
The one rule that matters most: never run fsck on a mounted filesystem (especially read-write). It rewrites metadata that the kernel is actively using and can corrupt an otherwise healthy filesystem. Always unmount first, or diagnose read-only.
Assumptions
- Distro: Ubuntu / Debian family (commands are nearly identical on RHEL family)
- Filesystem: ext4 is the main target (XFS / Btrfs use their own tools, covered below)
- You have root (
sudo)
What is fsck and when do you use it?
Conclusion:
fsckchecks and repairs filesystem integrity. Reach for it when metadata corruption is likely -Input/output error,Read-only file system, or a failed fsck at boot.
fsck (file system check) is a front-end that dispatches to a per-filesystem checker (e2fsck for the ext family, xfs_repair for XFS). It verifies the superblock, inode tables, directory structure, and block bitmaps, then repairs inconsistencies.
Typical situations where you need it:
- File operations repeatedly fail with
Input/output error - The filesystem suddenly flipped to
Read-only file system(the kernel detected corruption and protected it) - Boot stopped at "fsck failed" or "Give root password for maintenance"
- After an unclean shutdown or power loss, you want to verify integrity
Journaling filesystems (ext4 / XFS) recover most minor inconsistencies automatically at mount time. A manual fsck is for structural damage the journal replay can't fix, or for hardware-induced corruption.
Why must you never run fsck on a mounted filesystem?
Conclusion: A mounted FS has the kernel caching and updating metadata. If
fsckwrites through a separate path, the two views diverge and the "repair" corrupts live structures instead of fixing them.
On a live filesystem, the kernel updates inodes and bitmaps in the buffer cache. fsck reads and writes the block device directly. If fsck writes a "fix" while the kernel's view differs, it destroys structures that were still in use.
That is why repair-mode fsck refuses to run (or warns) when the target is mounted. There are three safe paths:
- Unmount, then run (for data partitions)
- Diagnose read-only with
-n(writes nothing - safer, but repairs nothing) - For the root FS, use rescue mode / boot-time fsck / a live USB (below)
# First, confirm the target is not mounted $ findmnt /dev/sdb1 $ lsblk -f
How do you safely check and repair a data partition?
Conclusion: Unmount, diagnose with
-n, then repair with-fyif needed. If hardware failure is likely, image the device withddrescuebefore repairing.
1. Unmount it
$ sudo umount /dev/sdb1
If it won't unmount with target is busy, find what's holding it.
$ sudo fuser -vm /dev/sdb1 $ sudo lsof /dev/sdb1
2. Diagnose read-only first (-n)
-n answers "no" to every question and writes nothing. Use it to assess the damage.
$ sudo fsck -n /dev/sdb1
3. Image the device if hardware failure is suspected
When a physical fault is likely (frequent Input/output error, below), rescue the sectors with ddrescue before a repair can make things worse.
$ sudo ddrescue /dev/sdb1 /mnt/backup/sdb1.img /mnt/backup/sdb1.log
4. Force-check and auto-repair (-fy)
$ sudo fsck -fy /dev/sdb1
-f: force a full check even if the FS is flagged "clean"-y: answer "yes" to every repair prompt (run unattended to completion)
-y skips the prompts but hands every judgment call to fsck. For important data, read the -n output first; if the damage is localized, consider running fsck with no flag so you can answer each prompt yourself.
How do you check the mounted root (/) filesystem?
Conclusion: You can't unmount a live
/. Use one of: schedule an automatic fsck at next boot, boot withfsck.mode=force, or check from a live USB.
The root filesystem is in use, so you normally can't unmount it. Three workarounds:
Option A: Schedule a forced fsck at next boot
On systemd systems you can request a boot-time check with a flag file.
# Ubuntu/Debian: force a one-time check on next boot $ sudo touch /forcefsck $ sudo reboot
systemd-fsck reads /forcefsck and runs the check early in boot (while root is still read-only), then deletes the flag automatically.
Option B: Pass a kernel parameter from GRUB
At the GRUB menu press e, then append to the end of the linux line:
fsck.mode=force fsck.repair=yes
fsck.repair=yes is like -y (unattended repair). To stay conservative, use fsck.repair=preen (like -p, only safe automatic fixes).
Option C: Check from a live USB / rescue media
If root is too damaged to boot, start a live USB and check it without mounting.
# In the live environment (never mount the target) $ sudo fsck -fy /dev/sda2
If the system drops into emergency mode, / is usually mounted read-only. Running fsck is relatively safe when it's read-only, but to be sure, make it explicit with mount -o remount,ro / before you run it.
How do the main fsck options differ?
Conclusion: Diagnose with
-n, repair unattended with-yor-p, and force a full check with-f. Combine them to fit the situation.
| Option | Meaning | When to use |
|---|---|---|
-n |
Write nothing, answer no to all | Assess damage first (read-only diagnosis) |
-y |
Answer yes to all (unattended repair) | No interaction / at boot |
-p |
Preen. Auto-fix only what's safe | Standard mode at boot |
-f |
Force a full check even if clean | Suspect damage hidden behind the journal |
-c |
Scan for bad blocks (badblocks, ext only) | Suspect physical media wear |
-A |
Check all filesystems in /etc/fstab in order |
Used internally by the boot sequence |
# For ext, e2fsck is called directly. To be explicit: $ sudo e2fsck -fy /dev/sdb1 # If the superblock is damaged, use a backup superblock $ sudo dumpe2fs /dev/sdb1 | grep -i superblock $ sudo e2fsck -b 32768 /dev/sdb1
Do not combine -p (preen) and -y. Preen is designed to auto-fix only safe issues and to stop with an error code when a problem needs human judgment. Mixing in -y conflicts with that intent.
What about non-ext4 filesystems (XFS / Btrfs)?
Conclusion:
fsck.xfsdoes nothing - XFS usesxfs_repair. Btrfs usesbtrfs check. Pick the right tool per filesystem or you aren't actually repairing anything.
Check the filesystem type with lsblk -f or blkid.
$ lsblk -f /dev/sdb1 $ sudo blkid /dev/sdb1
For XFS
XFS has no traditional fsck. /sbin/fsck.xfs exists but does nothing (a stub so boot isn't blocked). The real repair tool is xfs_repair.
# Always unmount first $ sudo umount /dev/sdb1 # Dry run first (-n changes nothing) $ sudo xfs_repair -n /dev/sdb1 # Repair $ sudo xfs_repair /dev/sdb1
Only when a damaged log makes xfs_repair refuse to run should you use -L (zero the log) as a last resort. It risks data loss, so don't reach for it casually.
For Btrfs
$ sudo umount /dev/sdb1 $ sudo btrfs check /dev/sdb1 # diagnose $ sudo btrfs check --repair /dev/sdb1 # repair (officially a last resort)
Both xfs_repair -L and btrfs check --repair are last-resort operations that can lose data. Image the device with ddrescue before running them.
What should you verify after a repair?
Conclusion: Read the exit code, inspect files rescued into
lost+found, remount and test read/write. If it recurs, suspect the disk withsmartctl.
1. Read the exit code
fsck exit codes are a bitmask.
$ sudo fsck -fy /dev/sdb1; echo "exit=$?"
| Code | Meaning |
|---|---|
| 0 | No errors |
| 1 | Errors corrected (fine) |
| 2 | Corrected, but a reboot is needed |
| 4 | Errors left uncorrected |
| 8 | Operational error |
0 or 1 means a clean finish. If 4 or higher remains, re-run or image the device.
2. Inspect lost+found
Inodes that lost their parent directory are recovered into lost+found/ at the FS root, named by inode number.
$ sudo mount /dev/sdb1 /mnt $ sudo ls -l /mnt/lost+found/ $ sudo file /mnt/lost+found/*
3. Remount and test read/write
$ sudo mount /dev/sdb1 /mnt $ touch /mnt/.write-test && rm /mnt/.write-test && echo "write OK"
4. If it recurs, suspect the hardware
$ sudo smartctl -a /dev/sdb | grep -iE 'reallocated|pending|uncorrect' $ sudo dmesg | grep -iE 'I/O error|ata|medium error'
If Reallocated_Sector_Ct or Current_Pending_Sector is climbing, it's a physical fault. Prefer replacing the disk and restoring from backup over keeping it alive with fsck.
What not to do (checklist)
Conclusion: Repairing while mounted, mistaking the FS type, an unverified
-y, and-Lwithout imaging are the classic ways to make corruption worse.
Failure patterns
- Running
fsck(read-write) directly on a mounted/dev/... - Running
fsck(i.e.fsck.xfs) on XFS and assuming it "fixed" anything - Jumping straight to
-ywithout reading the-noutput - Running
xfs_repair -Lwithout imaging when a physical fault is likely - Giving up on a damaged superblock without trying a backup superblock