Fixing "Stale file handle" on NFS - Remounting the Share

Fixing "Stale file handle" on NFS - Remounting the Share

What does "Stale file handle" actually mean?

Conclusion: The NFS client is holding a file handle that no longer matches the object on the server (ESTALE). NFS identifies objects by handle, not path, so server-side changes are the trigger.

On an NFS mount, ls, cat, or saving a file fails like this:

ls: cannot access '/mnt/nfs/data': Stale file handle

NFS identifies a file by a file handle, not by its path. A handle is composed roughly of three parts:

  • fsid — identifier of the exported filesystem
  • inode number — the inode of the target file or directory
  • generation number — distinguishes inodes that have been reused

The client caches this handle at mount time and on access, then reuses it for later operations. If the object the handle points to disappears or changes on the server, the next access makes the kernel return ESTALE (Stale file handle).

Key point: This is not a disk failure. The client is simply holding an outdated reference, and most cases are fixed by remounting on the client. Suspect the client side before touching the server.

Why does a Stale file handle happen?

Conclusion: It happens when the object a handle points to changes on the server. The four big causes are deleting/recreating a file, changing exports, an fsid change on server reboot, and inode/generation changes from a restore.

Here are the common triggers, ordered by how often you hit them in practice.

  • A. A file or directory is deleted and recreated on the server (most common) — if a file is removed and created again through another path while the client has it open, the inode/generation changes and the handle goes stale
  • B. Export configuration changes — editing /etc/exports and running exportfs -r, or changing an export path or options, breaks the assumptions behind existing handles
  • C. The fsid changes on server reboot — if /etc/exports does not pin fsid=, the server may auto-assign a different fsid on reboot, invalidating existing handles
  • D. Restore from backup or snapshot — a restore can change a file's inode or generation number, so the same path now resolves to a different handle

A and D are cases where "the path is identical but the object (inode) underneath was swapped out." It looks fine by filename, which makes the root cause hard to spot. When you see ESTALE, first suspect that the object was replaced on the server.

Check the situation first

Conclusion: Identify which NFS mount is affected and which processes are holding it before acting. Use findmnt for mounts and lsof / fuser for the processes using them.

1. Identify the NFS mount

$ findmnt -t nfs,nfs4
TARGET       SOURCE                  FSTYPE OPTIONS
/mnt/nfs     192.168.10.5:/export    nfs4   rw,relatime,vers=4.2,...

2. Reproduce the ESTALE

$ ls /mnt/nfs
ls: reading directory '/mnt/nfs': Stale file handle

3. Find processes holding the mount

A remount requires that nothing is using the target mount. Identify the holding processes first.

$ lsof +D /mnt/nfs 2>/dev/null
$ fuser -vm /mnt/nfs

Even your own shell sitting inside the mount with cd counts as busy. Step out of the mount with cd / first, then try the remount. That alone often lets umount succeed.

How do you recover? (client side)

Conclusion: The basic fix is "unmount, then remount" to re-fetch the handle. If it fails as busy, use a lazy unmount (umount -l); if that still fails, escalate to force (umount -f).

Step 1: Normal unmount and remount

$ cd /
$ sudo umount /mnt/nfs
$ sudo mount /mnt/nfs

If there is an /etc/fstab entry, mount /mnt/nfs alone remounts it. This resolves most cases.

Step 2: Lazy unmount when it is busy

When umount fails with target is busy, check whether you can stop the referencing processes, then use a lazy unmount.

$ sudo umount -l /mnt/nfs   # lazy: detach as soon as references clear
$ sudo mount /mnt/nfs

-l (lazy) detaches the mount point from the namespace immediately and releases it once the last reference goes away, so you can proceed to the remount even in a busy environment.

Step 3: Force unmount when the server is unresponsive

If the server is down or unreachable and I/O is hanging, use force.

$ sudo umount -f /mnt/nfs

umount -f can lose unwritten data. If an application is mid-write, stop it first where possible. Use force / lazy only as escalating last resorts.

Step 4: When only a single file is stale

If only a specific file or directory is stale rather than the whole mount, stepping out of and back into the directory can clear it.

$ cd /
$ cd /mnt/nfs/data   # re-fetch the handle

If it persists, move on to the remount in Steps 1-3.

What to check on the server

Conclusion: If a client remount does not help, or the problem hits all clients, suspect the server. Check the export state and fsid consistency.

Check the export state

$ sudo exportfs -v

Confirm that the export paths and options are what you intended. If you just changed /etc/exports, re-export to make sure it is applied.

$ sudo exportfs -ra

Check that fsid is pinned

If Stale appears after every server reboot, the auto-assigned fsid is likely changing. Pin it explicitly in /etc/exports.

# /etc/exports (server side)
/export  192.168.10.0/24(rw,sync,fsid=0,no_subtree_check)

fsid=0 is the NFSv4 pseudo-root. Assign a unique value (fsid=1, fsid=2, ...) or a UUID to each additional export. Apply changes with exportfs -ra.

How do you prevent it?

Conclusion: Pin fsid on the server and avoid deleting or recreating in-use files directly on the server. Mount options can also soften hangs.

  • Pin fsid= explicitly — the most effective measure against fsid changing on reboot
  • Do not modify in-use files directly on the server — delete or replace through the client, or when no client is referencing them
  • Make export changes during a maintenance window — plan /etc/exports edits and re-exports together with client remounts
  • Understand soft vs hardhard (default) keeps retrying until the server returns, which favors data integrity but hangs easily on outages; soft gives up on timeout, hanging less but risking lost writes. Choose by use case

Copy-paste: client-side recovery template

# 1. Step out of the mount
cd /

# 2. Which mount is affected, and who holds it
findmnt -t nfs,nfs4
fuser -vm /mnt/nfs

# 3. Remount (-l if busy, -f if hung)
sudo umount /mnt/nfs || sudo umount -l /mnt/nfs
sudo mount /mnt/nfs

Summary

  • Stale file handle (ESTALE) means the NFS client is holding an outdated file handle; it is not a disk failure
  • The cause is a server-side change of the object (delete/recreate, export change, fsid change, restore)
  • The basic fix is a client-side remount. Use umount -l when busy and umount -f when hung, escalating in steps
  • If it hits all clients or recurs on every reboot, check that fsid is pinned on the server

Next Reading