Diagnosing "Input/output error"
What does "Input/output error" actually mean?
Conclusion:
Input/output erroris the kernel'sEIO(errno 5). It is not a logic mistake - it means an I/O failed at the physical layer: a bad disk, a flaky cable/controller, a disconnected device, or a network storage outage.
A typical failure looks like this:
$ cat /var/log/app.log cat: /var/log/app.log: Input/output error $ cp bigfile /mnt/data/ cp: error reading 'bigfile': Input/output error
Unlike Permission denied (rights) or No space left (capacity), Input/output error means the command was correct but a lower layer could not answer. The cause is outside your app - on the device side.
Holders fall into a few groups. Triage them in this order:
- A. The disk itself is failing (most common) - bad sectors, wear, SMART faults. Often only specific files return EIO
- B. Connection / controller trouble - a loose SATA/USB cable, insufficient power, or HBA fault drops I/O intermittently
- C. The device was disconnected - a USB/external drive was unplugged,
/dev/sdXvanished - D. Network storage outage - an NFS/iSCSI server is down or timing out
- E. Severe filesystem corruption - damaged metadata rejects reads and writes
Input/output error is a symptom, not a cause. The same message covers a dying disk you must rescue and a transient fault that a reseated cable fixes. Jumping to fsck or a reformat before reading dmesg can destroy data that is still alive. Read the cause first.
What should I check first?
Conclusion: The primary source is the kernel log. Use
dmesg -Torjournalctl -kto read the device name (sda, etc.) and the exact error (I/O error, sector, link reset) from the moment EIO appeared. That almost always narrows it to one of A-E.
Listen to the kernel with dmesg / journalctl
EIO is a failure the kernel received from a device driver and passed up. The evidence is always in the kernel log.
# Recent errors, with timestamps dmesg -T | grep -iE 'error|i/o|fail|reset' | tail -30 # From the persistent log (survives reboots) journalctl -k -b -p err --no-pager
How to read the lines:
# A: failing disk (bad sector) blk_update_request: I/O error, dev sda, sector 1234567 op 0x0:(READ) critical medium error, dev sda, sector 1234567 # B: connection / link trouble ata1: SATA link down (SStatus 0 SControl 300) ata1.00: failed command: READ FPDMA QUEUED # C: device disconnect (USB unplug, etc.) sd 6:0:0:0: [sdb] Synchronize Cache(10) failed usb 1-1: USB disconnect, device number 5 # D: NFS outage nfs: server 10.0.0.5 not responding, still trying # E: filesystem corruption EXT4-fs error (device sda1): ext4_find_entry: reading directory lblock
A medium error with a sector number nails A (failing disk). Stacked link down / reset lines point to B (connection). A disconnect is C. This decides every step that follows.
Capture the log before you touch the file or device again. A dying disk can degrade with every re-read. "Just cat it one more time" is the worst move.
How do I confirm a failing disk? (SMART)
Conclusion: If dmesg shows a medium error / sector, read the disk's self-diagnostics with
smartctl. A risingReallocated_Sector_CtorCurrent_Pending_Sectormeans physical wear - back up and replace it urgently.
Read SMART data with smartctl from smartmontools (install with apt install smartmontools / dnf install smartmontools).
# Health summary sudo smartctl -H /dev/sda # All attributes sudo smartctl -a /dev/sda
Attributes that matter:
ID# ATTRIBUTE_NAME RAW_VALUE 5 Reallocated_Sector_Ct 48 <- reallocated bad sectors. rising = wear 197 Current_Pending_Sector 16 <- suspect sectors awaiting reallocation; the direct EIO cause 198 Offline_Uncorrectable 16 <- unrecoverable sectors 199 UDMA_CRC_Error_Count 120 <- cable/connection origin (the disk itself may be fine)
Current_Pending_Sector/Reallocated_Sector_Ctnon-zero and rising -> the disk is wearing out. Treat it as end-of-life: rescue data and replace.- Only
UDMA_CRC_Error_Countis high -> likely a cable/connection issue (B); reseating or replacing the cable may fix it.
Confirm with a short self-test:
sudo smartctl -t short /dev/sda # check results with -a a few minutes later
A disk that SMART flags as worn can die completely at any moment. Running fsck or badblocks -w (a write test) first can take the still-readable data with it. The order is always (1) rescue (ddrescue) -> (2) check/repair. Getting a copy onto a healthy disk comes first.
Is it filesystem corruption?
Conclusion: If dmesg shows
EXT4-fs error(or similar) but SMART is healthy, repair the metadata withfsck- but only while the filesystem is unmounted. Running it on a mounted FS makes corruption worse.
First inspect read-only (-n writes nothing):
# Identify the target and its mount state lsblk -f findmnt /mnt/data # Unmount, then check (-n = read-only dry run) sudo umount /dev/sda1 sudo fsck -n /dev/sda1
If the root filesystem (/) is the target and cannot be unmounted, run fsck at boot or from a live USB / rescue mode.
# Force fsck on next boot (for the root FS) sudo touch /forcefsck # on systemd, the fsck.mode=force kernel arg is more reliable
Once confirmed, repair for real - assuming important data is already rescued:
sudo fsck -y /dev/sda1 # -y = auto-approve repairs
Never run fsck on a mounted filesystem. The kernel and the tool rewrite the same metadata independently and turn minor damage into a fatal mess. If you cannot umount (device is busy), see Fixing "device is busy" on umount.
When it is not the disk
Conclusion: If dmesg shows
link down/disconnect/nfs ... not responding, the cause is connection, disconnect, or network - not the disk surface. Physical checks or a remount usually fix it; nofsckneeded.
Connection / cable (B)
High UDMA_CRC_Error, stacked SATA link down / ata reset:
- Reseat the SATA/USB cable; try a different port or cable
- External drives often fail on insufficient power - use a self-powered USB hub or AC adapter
- Watch dmesg live to see if it improves (
dmesg -w)
Device disconnect (C)
On USB disconnect, /dev/sdX vanishes and every later operation returns EIO.
lsblk # is the device visible? sudo dmesg -w # watch the moment you reconnect
A mount on a vanished device is dead. Unmount, reconnect, and remount.
Network storage (D)
An NFS server outage or network fault also surfaces as EIO. Check the server and the path.
mount | grep nfs ping <nfs-server> showmount -e <nfs-server> # are exports visible?
For a hung NFS mount, also read Fixing "Stale file handle" on NFS. Stale file handle (ESTALE) is easy to confuse with EIO but needs a different fix.
How do I rescue data in an emergency?
Conclusion: Pull data off a dying disk with
ddrescue, not plaincp. It grabs the readable blocks first and skips bad ones, so you recover the maximum possible before the disk gives out.
cp stalls on EIO and re-hammers the disk on retry. ddrescue (from gddrescue; apt install gddrescue, command name ddrescue) skips over bad areas and can resume from a map file.
# /dev/sdb (failing disk) -> /dev/sdc (healthy destination) # the third argument is a map file that lets you pause and resume sudo ddrescue -d -r3 /dev/sdb /dev/sdc rescue.map
-d- direct I/O (read real sectors, bypassing the OS cache)-r3- retry bad blocks up to 3 timesrescue.map- progress map; rerun the same command to resume from where it stopped
Image the whole device/partition to the destination, then run fsck or file recovery against the copy. The goal is to minimize operations against the dying disk itself.
Fast path: (1) dmesg -T \| grep -i error to classify the cause (A-E) -> (2) on a medium error, confirm wear with smartctl -a -> (3) if worn, ddrescue immediately -> (4) for FS corruption, fsck after rescue (unmounted) -> (5) for link/disconnect/NFS, check the physical path. Skipping dmesg is the one mistake to avoid.