Fixing SSH Disconnects: Keepalive and Timeouts

2026-06-06 Reading time: About 12 min Difficulty: Intermediate

Why Does SSH Disconnect on Its Own?

Conclusion: Most often, a NAT/firewall on the path or sshd silently kills an idle (no-traffic) connection. If it dies with Broken pipe after a short pause, keepalive — small periodic packets — is not running. Setting ServerAliveInterval on the client or ClientAliveInterval on the server stops most of it.

If SSH is "fine while I'm typing but dead when I step away", idle timeout is almost always the culprit. A TCP connection survives silence for a while, but NAT devices, stateful firewalls, and load balancers on the path evict sessions that go quiet for too long. After eviction, your next keystroke has nowhere to go, and the session dies with messages like these:

client_loop: send disconnect: Broken pipe

Write failed: Broken pipe
packet_write_wait: Connection to 192.0.2.10 port 22: Broken pipe

Timeout, server not responding.

Conversely, if it drops while you are actively working, or at the exact same elapsed time every session, suspect an explicit server-side session limit or an unstable link (covered below). The fastest first step is to settle whether it dies idle or dies in use.

Prerequisites

Client and server both Ubuntu / a typical Linux (OpenSSH)
Edit ~/.ssh/config on the client; /etc/ssh/sshd_config on the server
Applying server changes needs sudo and an sshd reload

Idle Drop or In-Use Drop — How Do You Tell?

Conclusion: If it dies while sitting idle, it is an idle timeout (NAT / firewall / ClientAliveInterval). If it dies even while you keep the screen updating (e.g. running top), suspect link quality or an explicit session cap. The former is fixed by keepalive; the latter needs a different approach.

To split the cause in two, first test whether constant traffic keeps it alive.

# Keeps the connection busy with light periodic output
$ ssh user@server 'while true; do date; sleep 30; done'

It stops dropping → silence was the cause: an idle timeout. Keepalive fixes it (next section).
It still drops → suspect a flaky link (Wi-Fi / mobile) or a fixed server-side timeout (other sshd_config settings, PAM, or a load balancer hard limit).

Holding the connection with ssh -v and reading the log at the moment it dies makes this conclusive.

$ ssh -v user@server

debug1: client_loop: send disconnect: Broken pipe

Whether keepalive traffic stalled just before the cut, or server not responding appeared, tells you which side gave up first.

If the time-to-drop is nearly constant every time, an artificial timeout is likely (NAT conntrack, or the server's ClientAliveInterval × ClientAliveCountMax). If it varies, lean toward a link-quality problem.

Client Side: How Do You Set ServerAlive?

Conclusion: Put ServerAliveInterval 60 in the client's ~/.ssh/config first. Even with no traffic, ssh sends a request through the encrypted channel every 60 seconds, keeping the NAT session alive and detecting a dead server. It works from the client alone, even when you cannot touch the server (shared hosts, etc.).

ServerAliveInterval is the number of seconds of silence from the server after which ssh sends a request through the encrypted channel. The default is 0 (disabled). ServerAliveCountMax (default 3) is how many such probes may go unanswered before ssh disconnects.

# ~/.ssh/config
Host *
    ServerAliveInterval 60
    ServerAliveCountMax 3

With this, a small packet flows every 60 seconds so the NAT/firewall session stays alive, preventing idle drops. At the same time, if the server truly dies, ssh gives up after 60s × 3 = ~180s instead of leaving a zombie session.

To try it ad hoc, use -o.

$ ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=3 user@server

Host * applies it to every connection; scope it to a Host myserver block to limit it. Keep ~/.ssh/config at mode 600 — overly loose permissions can cause it to be ignored.

Setting ServerAliveInterval too low means a brief blip burns through ServerAliveCountMax and makes drops more likely. On flaky links, do not shrink the interval aggressively; raise the count instead (e.g. Interval 30 / CountMax 6 gives ~180s of slack).

Server Side: How Do You Set ClientAlive?

Conclusion: If you manage the server, put ClientAliveInterval 60 in /etc/ssh/sshd_config. sshd sends a request through the encrypted channel to every client, so you prevent idle drops without configuring each client. The same parameters also let you deliberately disconnect idle clients — direction depends on the CountMax you pair it with.

ClientAliveInterval is the mirror of the client's ServerAliveInterval: sshd sends a probe after N seconds of silence from the client. Default 0 (disabled). ClientAliveCountMax defaults to 3.

# /etc/ssh/sshd_config
ClientAliveInterval 60
ClientAliveCountMax 3

Validate the syntax before applying. Break the sshd config here and you can lock yourself out, so always do this with a second session open.

# Syntax check (catches errors before they take effect)
$ sudo sshd -t

# Reload the config (existing sessions are preserved)
$ sudo systemctl reload ssh

# sshd -t prints nothing when the config is valid

Note the meaning flips with intent:

Keep connections alive: ClientAliveInterval 60 alone keeps keepalive flowing and beats NAT eviction. A larger CountMax is fine.
Disconnect idle clients (e.g. a security requirement): ClientAliveInterval 300 / ClientAliveCountMax 1 drops a silent client after about 5 minutes (note that ClientAliveCountMax 0 instead disables termination).

A restart can drop existing connections; prefer reload to apply changes. Run sshd -t → reload with a second SSH session still open to guard against lockout. If ufw locks out even SSH, see ufw SSH Troubleshooting.

How Does ServerAlive Differ from TCPKeepAlive?

Conclusion: ServerAliveInterval / ClientAliveInterval run over the SSH encrypted channel — they cannot be spoofed and let you tune the interval in seconds. TCPKeepAlive is TCP-layer keepalive, tied to the OS default (typically a 2-hour idle start) and spoofable. For idle-drop prevention, use the SSH-layer ServerAlive family.

The three commonly confused settings, by layer:

Setting	Layer	Direction	Notes
`ServerAliveInterval`	SSH (encrypted)	client→server	Set on the client. Not spoofable
`ClientAliveInterval`	SSH (encrypted)	server→client	Set on the server. Not spoofable
`TCPKeepAlive`	TCP	both ways	Default `yes`. Interval is OS-dependent

Even with TCPKeepAlive yes (the OpenSSH default), liveness is checked, but the idle time before TCP keepalive starts follows the OS setting (on Linux, net.ipv4.tcp_keepalive_time, default 7200 seconds = 2 hours). That is far too slow for a NAT that cuts in minutes. To outlast a short idle timeout, the second-granular ServerAliveInterval is the right tool.

# For reference: the TCP keepalive idle start (seconds)
$ sysctl net.ipv4.tcp_keepalive_time

net.ipv4.tcp_keepalive_time = 7200

The official manuals (man ssh_config / man sshd_config) state that ServerAlive messages are "sent through the encrypted channel and therefore not spoofable", whereas TCPKeepAlive is spoofable. The SSH-layer keepalive wins on both security and control.

What If It Still Drops After Configuring Keepalive?

Conclusion: If keepalive is set and it still drops, it is one of: ① a NAT / LB idle timeout shorter than your keepalive interval, ② the link itself flapping, or ③ another server-side timeout (PAM, a load balancer hard limit). The fix is to make the keepalive interval reliably shorter than that idle value.

Work through the usual suspects when keepalive is not enough.

# Look for keepalive / disconnect / timeout cues in the verbose log
$ ssh -v user@server 2>&1 | grep -iE 'alive|disconnect|timeout'

NAT / firewall idle is short: home routers and cloud LBs can idle out in roughly 60–350 seconds. Set ServerAliveInterval clearly below it (if the LB is 60s, use 30).
Link flapping: Wi-Fi / mobile links physically drop. Keepalive cannot save those — use tmux / mosh (below) to tolerate disconnects instead.
Another server-side limit: a load balancer or bastion session cap, or a PAM rule, may cut at a fixed time. Check the server's reason with journalctl -u ssh.

# Check the server-side disconnect reason
$ sudo journalctl -u ssh -n 100 --no-pager

Behind a cloud load balancer (AWS NLB / GCP, etc.), the LB's own idle timeout dominates (AWS NLB defaults to 350 seconds). Always set the keepalive interval below that value.

A keepalive interval equal to or longer than the idle timeout is useless. "LB is 60s, so ServerAliveInterval 60" still slips through at the boundary. Aim for half or less (30 or below in this example).

What Should You Use to Survive Disconnects?

Conclusion: The deeper fix is to make a drop harmless. Run tmux / screen on the server and your session persists even when SSH dies — reconnect and attach to return. On flaky links, mosh reconnects automatically.

Keepalive keeps you connected; tmux / mosh keep a disconnect from hurting. Combine both for robustness.

# Start a tmux session on the server
$ tmux new -s work

# After SSH drops, reconnect and return to the session
$ ssh user@server
$ tmux attach -t work

Run long batches and risky commands inside tmux / screen so a link drop does not take the process down with it. If large scp / rsync transfers keep failing mid-way, use resumable rsync (see File Transfer Basics).

mosh (mobile shell) is UDP-based and recovers automatically across IP changes and brief outages — ideal for mobile or unstable links. It does require installing mosh on the server and opening its UDP ports.

Summary and Checklist

Conclusion: If it dies idle, the first move is keepalive — ServerAliveInterval on the client or ClientAliveInterval on the server. Keep the interval below any NAT / LB idle timeout, and pair it with tmux / mosh so a drop is harmless. That clears up almost all surprise SSH disconnects.

[ ] Split idle-drop vs in-use-drop (silence vs link quality)
[ ] Set ServerAliveInterval / ServerAliveCountMax in the client's ~/.ssh/config
[ ] If you own the server, set ClientAliveInterval in /etc/ssh/sshd_config, then sshd -t → reload
[ ] Make the keepalive interval reliably shorter than the NAT / LB idle timeout
[ ] Keep a second SSH session open when changing config to guard against lockout
[ ] On flaky links, use tmux / screen / mosh to tolerate disconnects
[ ] If it still drops, check the server-side reason with journalctl -u ssh