Shorten default TCP keepalive time #
- Date proposed: 2025-02-21
- RFC MR: https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/51
Summary #
Set our default net.ipv4.tcp_keepalive_time
to override the extremely
conservative kernel default.
Motivation #
The Linux kernel supports using “TCP keepalive” in order to signal to the other endpoint as well as anything in-between that a TCP connection which is currently transferring no data is still connected.
The default parameters for this feature make it send a packet every 75 seconds after the connection has been idle for 2 hours.
While this is well within the 120 hours that Linux permits NAT mappings for established connections, other products like AVM routers can have much shorter timeouts like 15 minutes.
As a result, idle TCP connections going through such a router will lose their association much earlier than expected, leading to a disconnect. We have seen SSH sessions to our build server get disconnected because a build step took too long to complete while producing no output, leading to a build failure.
Windows used to have the same timeout of 2 hours, but of version 8.1 has reduced this to 2 minutes. We should follow this.
Specification #
Add the following line to /usr/lib/sysctl.d/10-arch.conf
in the
filesystem
package:
net.ipv4.tcp_keepalive_time = 120
This will start the keepalive probes after two minutes of idle time.
Drawbacks #
This will make quiescent connections slightly more expensive to maintain, but in practice this should be negligible.
Unresolved Questions #
Are there any other related settings we should modify?
Alternatives Considered #
- Some programs support application-level keepalive. We recently enabled such a setting for our SSH servers to avoid the problems mentioned earlier, see infrastructure!928.