0051 Shorten default TCP keepalive time

Shorten default TCP keepalive time #

Summary #

Set our default net.ipv4.tcp_keepalive_time to override the extremely conservative kernel default.

Motivation #

The Linux kernel supports using “TCP keepalive” in order to signal to the other endpoint as well as anything in-between that a TCP connection which is currently transferring no data is still connected.

The default parameters for this feature make it send a packet every 75 seconds after the connection has been idle for 2 hours.

While this is well within the 120 hours that Linux permits NAT mappings for established connections, other products like AVM routers can have much shorter timeouts like 15 minutes.

As a result, idle TCP connections going through such a router will lose their association much earlier than expected, leading to a disconnect. We have seen SSH sessions to our build server get disconnected because a build step took too long to complete while producing no output, leading to a build failure.

Windows used to have the same timeout of 2 hours, but of version 8.1 has reduced this to 2 minutes. We should follow this.

Specification #

Add the following line to /usr/lib/sysctl.d/10-arch.conf in the filesystem package:

net.ipv4.tcp_keepalive_time = 120

This will start the keepalive probes after two minutes of idle time.

Drawbacks #

This will make quiescent connections slightly more expensive to maintain, but in practice this should be negligible.

Unresolved Questions #

Are there any other related settings we should modify?

Alternatives Considered #

  • Some programs support application-level keepalive. We recently enabled such a setting for our SSH servers to avoid the problems mentioned earlier, see infrastructure!928.