Diagnosing "Input/output error"

Diagnosing "Input/output error"

What does "Input/output error" actually mean?

Conclusion: Input/output error is the kernel's EIO (errno 5). It is not a logic mistake - it means an I/O failed at the physical layer: a bad disk, a flaky cable/controller, a disconnected device, or a network storage outage.

A typical failure looks like this:

$ cat /var/log/app.log
cat: /var/log/app.log: Input/output error

$ cp bigfile /mnt/data/
cp: error reading 'bigfile': Input/output error

Unlike Permission denied (rights) or No space left (capacity), Input/output error means the command was correct but a lower layer could not answer. The cause is outside your app - on the device side.

Holders fall into a few groups. Triage them in this order:

  • A. The disk itself is failing (most common) - bad sectors, wear, SMART faults. Often only specific files return EIO
  • B. Connection / controller trouble - a loose SATA/USB cable, insufficient power, or HBA fault drops I/O intermittently
  • C. The device was disconnected - a USB/external drive was unplugged, /dev/sdX vanished
  • D. Network storage outage - an NFS/iSCSI server is down or timing out
  • E. Severe filesystem corruption - damaged metadata rejects reads and writes

Input/output error is a symptom, not a cause. The same message covers a dying disk you must rescue and a transient fault that a reseated cable fixes. Jumping to fsck or a reformat before reading dmesg can destroy data that is still alive. Read the cause first.

What should I check first?

Conclusion: The primary source is the kernel log. Use dmesg -T or journalctl -k to read the device name (sda, etc.) and the exact error (I/O error, sector, link reset) from the moment EIO appeared. That almost always narrows it to one of A-E.

Listen to the kernel with dmesg / journalctl

EIO is a failure the kernel received from a device driver and passed up. The evidence is always in the kernel log.

# Recent errors, with timestamps
dmesg -T | grep -iE 'error|i/o|fail|reset' | tail -30

# From the persistent log (survives reboots)
journalctl -k -b -p err --no-pager

How to read the lines:

# A: failing disk (bad sector)
blk_update_request: I/O error, dev sda, sector 1234567 op 0x0:(READ)
critical medium error, dev sda, sector 1234567

# B: connection / link trouble
ata1: SATA link down (SStatus 0 SControl 300)
ata1.00: failed command: READ FPDMA QUEUED

# C: device disconnect (USB unplug, etc.)
sd 6:0:0:0: [sdb] Synchronize Cache(10) failed
usb 1-1: USB disconnect, device number 5

# D: NFS outage
nfs: server 10.0.0.5 not responding, still trying

# E: filesystem corruption
EXT4-fs error (device sda1): ext4_find_entry: reading directory lblock

A medium error with a sector number nails A (failing disk). Stacked link down / reset lines point to B (connection). A disconnect is C. This decides every step that follows.

Capture the log before you touch the file or device again. A dying disk can degrade with every re-read. "Just cat it one more time" is the worst move.

How do I confirm a failing disk? (SMART)

Conclusion: If dmesg shows a medium error / sector, read the disk's self-diagnostics with smartctl. A rising Reallocated_Sector_Ct or Current_Pending_Sector means physical wear - back up and replace it urgently.

Read SMART data with smartctl from smartmontools (install with apt install smartmontools / dnf install smartmontools).

# Health summary
sudo smartctl -H /dev/sda

# All attributes
sudo smartctl -a /dev/sda

Attributes that matter:

ID# ATTRIBUTE_NAME          RAW_VALUE
  5 Reallocated_Sector_Ct   48      <- reallocated bad sectors. rising = wear
197 Current_Pending_Sector  16      <- suspect sectors awaiting reallocation; the direct EIO cause
198 Offline_Uncorrectable   16      <- unrecoverable sectors
199 UDMA_CRC_Error_Count    120     <- cable/connection origin (the disk itself may be fine)
  • Current_Pending_Sector / Reallocated_Sector_Ct non-zero and rising -> the disk is wearing out. Treat it as end-of-life: rescue data and replace.
  • Only UDMA_CRC_Error_Count is high -> likely a cable/connection issue (B); reseating or replacing the cable may fix it.

Confirm with a short self-test:

sudo smartctl -t short /dev/sda     # check results with -a a few minutes later

Is it filesystem corruption?

Conclusion: If dmesg shows EXT4-fs error (or similar) but SMART is healthy, repair the metadata with fsck - but only while the filesystem is unmounted. Running it on a mounted FS makes corruption worse.

First inspect read-only (-n writes nothing):

# Identify the target and its mount state
lsblk -f
findmnt /mnt/data

# Unmount, then check (-n = read-only dry run)
sudo umount /dev/sda1
sudo fsck -n /dev/sda1

If the root filesystem (/) is the target and cannot be unmounted, run fsck at boot or from a live USB / rescue mode.

# Force fsck on next boot (for the root FS)
sudo touch /forcefsck      # on systemd, the fsck.mode=force kernel arg is more reliable

Once confirmed, repair for real - assuming important data is already rescued:

sudo fsck -y /dev/sda1     # -y = auto-approve repairs

Never run fsck on a mounted filesystem. The kernel and the tool rewrite the same metadata independently and turn minor damage into a fatal mess. If you cannot umount (device is busy), see Fixing "device is busy" on umount.

When it is not the disk

Conclusion: If dmesg shows link down / disconnect / nfs ... not responding, the cause is connection, disconnect, or network - not the disk surface. Physical checks or a remount usually fix it; no fsck needed.

Connection / cable (B)

High UDMA_CRC_Error, stacked SATA link down / ata reset:

  • Reseat the SATA/USB cable; try a different port or cable
  • External drives often fail on insufficient power - use a self-powered USB hub or AC adapter
  • Watch dmesg live to see if it improves (dmesg -w)

Device disconnect (C)

On USB disconnect, /dev/sdX vanishes and every later operation returns EIO.

lsblk                       # is the device visible?
sudo dmesg -w               # watch the moment you reconnect

A mount on a vanished device is dead. Unmount, reconnect, and remount.

Network storage (D)

An NFS server outage or network fault also surfaces as EIO. Check the server and the path.

mount | grep nfs
ping <nfs-server>
showmount -e <nfs-server>    # are exports visible?

For a hung NFS mount, also read Fixing "Stale file handle" on NFS. Stale file handle (ESTALE) is easy to confuse with EIO but needs a different fix.

How do I rescue data in an emergency?

Conclusion: Pull data off a dying disk with ddrescue, not plain cp. It grabs the readable blocks first and skips bad ones, so you recover the maximum possible before the disk gives out.

cp stalls on EIO and re-hammers the disk on retry. ddrescue (from gddrescue; apt install gddrescue, command name ddrescue) skips over bad areas and can resume from a map file.

# /dev/sdb (failing disk) -> /dev/sdc (healthy destination)
# the third argument is a map file that lets you pause and resume
sudo ddrescue -d -r3 /dev/sdb /dev/sdc rescue.map
  • -d - direct I/O (read real sectors, bypassing the OS cache)
  • -r3 - retry bad blocks up to 3 times
  • rescue.map - progress map; rerun the same command to resume from where it stopped

Image the whole device/partition to the destination, then run fsck or file recovery against the copy. The goal is to minimize operations against the dying disk itself.

Fast path: (1) dmesg -T \| grep -i error to classify the cause (A-E) -> (2) on a medium error, confirm wear with smartctl -a -> (3) if worn, ddrescue immediately -> (4) for FS corruption, fsck after rescue (unmounted) -> (5) for link/disconnect/NFS, check the physical path. Skipping dmesg is the one mistake to avoid.

Summary / Next reading