Fixing SSH Disconnects: Keepalive and Timeouts
Why Does SSH Disconnect on Its Own?
Conclusion: Most often, a NAT/firewall on the path or sshd silently kills an idle (no-traffic) connection. If it dies with
Broken pipeafter a short pause, keepalive — small periodic packets — is not running. SettingServerAliveIntervalon the client orClientAliveIntervalon the server stops most of it.
If SSH is "fine while I'm typing but dead when I step away", idle timeout is almost always the culprit. A TCP connection survives silence for a while, but NAT devices, stateful firewalls, and load balancers on the path evict sessions that go quiet for too long. After eviction, your next keystroke has nowhere to go, and the session dies with messages like these:
client_loop: send disconnect: Broken pipe
Write failed: Broken pipe packet_write_wait: Connection to 192.0.2.10 port 22: Broken pipe
Timeout, server not responding.
Conversely, if it drops while you are actively working, or at the exact same elapsed time every session, suspect an explicit server-side session limit or an unstable link (covered below). The fastest first step is to settle whether it dies idle or dies in use.
Prerequisites
- Client and server both Ubuntu / a typical Linux (OpenSSH)
- Edit
~/.ssh/configon the client;/etc/ssh/sshd_configon the server - Applying server changes needs
sudoand an sshd reload
Idle Drop or In-Use Drop — How Do You Tell?
Conclusion: If it dies while sitting idle, it is an idle timeout (NAT / firewall /
ClientAliveInterval). If it dies even while you keep the screen updating (e.g. runningtop), suspect link quality or an explicit session cap. The former is fixed by keepalive; the latter needs a different approach.
To split the cause in two, first test whether constant traffic keeps it alive.
# Keeps the connection busy with light periodic output $ ssh user@server 'while true; do date; sleep 30; done'
- It stops dropping → silence was the cause: an idle timeout. Keepalive fixes it (next section).
- It still drops → suspect a flaky link (Wi-Fi / mobile) or a fixed server-side timeout (other
sshd_configsettings, PAM, or a load balancer hard limit).
Holding the connection with ssh -v and reading the log at the moment it dies makes this conclusive.
$ ssh -v user@server
debug1: client_loop: send disconnect: Broken pipe
Whether keepalive traffic stalled just before the cut, or server not responding appeared, tells you which side gave up first.
If the time-to-drop is nearly constant every time, an artificial timeout is likely (NAT conntrack, or the server's ClientAliveInterval × ClientAliveCountMax). If it varies, lean toward a link-quality problem.
Client Side: How Do You Set ServerAlive?
Conclusion: Put
ServerAliveInterval 60in the client's~/.ssh/configfirst. Even with no traffic, ssh sends a request through the encrypted channel every 60 seconds, keeping the NAT session alive and detecting a dead server. It works from the client alone, even when you cannot touch the server (shared hosts, etc.).
ServerAliveInterval is the number of seconds of silence from the server after which ssh sends a request through the encrypted channel. The default is 0 (disabled). ServerAliveCountMax (default 3) is how many such probes may go unanswered before ssh disconnects.
# ~/.ssh/config
Host *
ServerAliveInterval 60
ServerAliveCountMax 3With this, a small packet flows every 60 seconds so the NAT/firewall session stays alive, preventing idle drops. At the same time, if the server truly dies, ssh gives up after 60s × 3 = ~180s instead of leaving a zombie session.
To try it ad hoc, use -o.
$ ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=3 user@server
Host * applies it to every connection; scope it to a Host myserver block to limit it. Keep ~/.ssh/config at mode 600 — overly loose permissions can cause it to be ignored.
Setting ServerAliveInterval too low means a brief blip burns through ServerAliveCountMax and makes drops more likely. On flaky links, do not shrink the interval aggressively; raise the count instead (e.g. Interval 30 / CountMax 6 gives ~180s of slack).
Server Side: How Do You Set ClientAlive?
Conclusion: If you manage the server, put
ClientAliveInterval 60in/etc/ssh/sshd_config. sshd sends a request through the encrypted channel to every client, so you prevent idle drops without configuring each client. The same parameters also let you deliberately disconnect idle clients — direction depends on the CountMax you pair it with.
ClientAliveInterval is the mirror of the client's ServerAliveInterval: sshd sends a probe after N seconds of silence from the client. Default 0 (disabled). ClientAliveCountMax defaults to 3.
# /etc/ssh/sshd_config ClientAliveInterval 60 ClientAliveCountMax 3
Validate the syntax before applying. Break the sshd config here and you can lock yourself out, so always do this with a second session open.
# Syntax check (catches errors before they take effect) $ sudo sshd -t # Reload the config (existing sessions are preserved) $ sudo systemctl reload ssh
# sshd -t prints nothing when the config is valid
Note the meaning flips with intent:
- Keep connections alive:
ClientAliveInterval 60alone keeps keepalive flowing and beats NAT eviction. A largerCountMaxis fine. - Disconnect idle clients (e.g. a security requirement):
ClientAliveInterval 300/ClientAliveCountMax 0drops a silent client after about 5 minutes.
A restart can drop existing connections; prefer reload to apply changes. Run sshd -t → reload with a second SSH session still open to guard against lockout. If ufw locks out even SSH, see ufw SSH Troubleshooting.
How Does ServerAlive Differ from TCPKeepAlive?
Conclusion:
ServerAliveInterval/ClientAliveIntervalrun over the SSH encrypted channel — they cannot be spoofed and let you tune the interval in seconds.TCPKeepAliveis TCP-layer keepalive, tied to the OS default (typically a 2-hour idle start) and spoofable. For idle-drop prevention, use the SSH-layer ServerAlive family.
The three commonly confused settings, by layer:
| Setting | Layer | Direction | Notes |
|---|---|---|---|
ServerAliveInterval |
SSH (encrypted) | client→server | Set on the client. Not spoofable |
ClientAliveInterval |
SSH (encrypted) | server→client | Set on the server. Not spoofable |
TCPKeepAlive |
TCP | both ways | Default yes. Interval is OS-dependent |
Even with TCPKeepAlive yes (the OpenSSH default), liveness is checked, but the idle time before TCP keepalive starts follows the OS setting (on Linux, net.ipv4.tcp_keepalive_time, default 7200 seconds = 2 hours). That is far too slow for a NAT that cuts in minutes. To outlast a short idle timeout, the second-granular ServerAliveInterval is the right tool.
# For reference: the TCP keepalive idle start (seconds) $ sysctl net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_time = 7200
The official manuals (man ssh_config / man sshd_config) state that ServerAlive messages are "sent through the encrypted channel and therefore not spoofable", whereas TCPKeepAlive is spoofable. The SSH-layer keepalive wins on both security and control.
What If It Still Drops After Configuring Keepalive?
Conclusion: If keepalive is set and it still drops, it is one of: ① a NAT / LB idle timeout shorter than your keepalive interval, ② the link itself flapping, or ③ another server-side timeout (PAM, a load balancer hard limit). The fix is to make the keepalive interval reliably shorter than that idle value.
Work through the usual suspects when keepalive is not enough.
# Look for keepalive / disconnect / timeout cues in the verbose log $ ssh -v user@server 2>&1 | grep -iE 'alive|disconnect|timeout'
- NAT / firewall idle is short: home routers and cloud LBs can idle out in roughly 60–350 seconds. Set
ServerAliveIntervalclearly below it (if the LB is 60s, use30). - Link flapping: Wi-Fi / mobile links physically drop. Keepalive cannot save those — use tmux / mosh (below) to tolerate disconnects instead.
- Another server-side limit: a load balancer or bastion session cap, or a PAM rule, may cut at a fixed time. Check the server's reason with
journalctl -u ssh.
# Check the server-side disconnect reason $ sudo journalctl -u ssh -n 100 --no-pager
Behind a cloud load balancer (AWS NLB / GCP, etc.), the LB's own idle timeout dominates (AWS NLB defaults to 350 seconds). Always set the keepalive interval below that value.
A keepalive interval equal to or longer than the idle timeout is useless. "LB is 60s, so ServerAliveInterval 60" still slips through at the boundary. Aim for half or less (30 or below in this example).
What Should You Use to Survive Disconnects?
Conclusion: The deeper fix is to make a drop harmless. Run
tmux/screenon the server and your session persists even when SSH dies — reconnect andattachto return. On flaky links,moshreconnects automatically.
Keepalive keeps you connected; tmux / mosh keep a disconnect from hurting. Combine both for robustness.
# Start a tmux session on the server $ tmux new -s work # After SSH drops, reconnect and return to the session $ ssh user@server $ tmux attach -t work
Run long batches and risky commands inside tmux / screen so a link drop does not take the process down with it. If large scp / rsync transfers keep failing mid-way, use resumable rsync (see File Transfer Basics).
mosh (mobile shell) is UDP-based and recovers automatically across IP changes and brief outages — ideal for mobile or unstable links. It does require installing mosh on the server and opening its UDP ports.
Summary and Checklist
Conclusion: If it dies idle, the first move is keepalive —
ServerAliveIntervalon the client orClientAliveIntervalon the server. Keep the interval below any NAT / LB idle timeout, and pair it with tmux / mosh so a drop is harmless. That clears up almost all surprise SSH disconnects.
- [ ] Split idle-drop vs in-use-drop (silence vs link quality)
- [ ] Set
ServerAliveInterval/ServerAliveCountMaxin the client's~/.ssh/config - [ ] If you own the server, set
ClientAliveIntervalin/etc/ssh/sshd_config, thensshd -t→reload - [ ] Make the keepalive interval reliably shorter than the NAT / LB idle timeout
- [ ] Keep a second SSH session open when changing config to guard against lockout
- [ ] On flaky links, use tmux / screen / mosh to tolerate disconnects
- [ ] If it still drops, check the server-side reason with
journalctl -u ssh