Thursday 28 March 2019

Netfilter's conntrack

People who use Linux for firewalling tend to use iptables to set up their rules.  The subsystem in the Linux kernel that actually does the firewalling is called Netfilter.

I've never found a complete description of all of Netfilter's features, especially some of the lesser used ones.  So here is a bit of an overview which includes a few recent discoveries that I've not seen documented elsewhere:

Netfilter includes a connection tracker, which can keep track of each flow that the system is handling.  Each flow has a 32 bit value called the connection mark (connmark), which you can use for anything you like.  This mark allows you to record 32 bits of information that persists as long as the flow does rather than having to treat each packet in complete isolation.

Packets traversing through the system are always in one of the following connection tracking states: UNTRACKED, INVALID, NEW, ESTABISHED, RELATED.

UNTRACKED and INVALID refer to packets that are either explicitly being excluded from connection tracking, or that the connection tracker doesn't think are valid for the current state of the flows that it knows about.

When a new flow is established, the first packet is in either the NEW or RELATED state, and subsequent packets are in the ESTABLISHED state.  RELATED means that netfilter thinks that the new flow is somehow related to another flow, and therefore shouldn't be handled in complete isolation.

I've seen information elsewhere that says that when a flow starts in the RELATED state, it inherits the connmark from the parent.  Experimentation shows that this isn't entirely accurate (or at least, not entirely clear).  It turns out that flows that start in the RELATED state permanently share the same connmark data with the flow(s) that they are related to.  This means that if any of the flows change their connmark, those changes also affect any other flows that they are related to.

The REJECT filter target asks the kernel to drop the packet being processed and reply with some kind of packet that indicates that it was rejected.  For example, "-j REJECT --reject-with tcp-reset" will respond with a TCP RST packet.  The response packet originates in the OUTPUT chains and has a state of RELATED, rather than being considered part of the original connection as you might expect.

In the case of rejecting connections with a TCP RST packet, the RST will, of course, have the same 5-tuple as the original TCP connection.  There doesn't appear to be any way of accessing a unique ID that identifies the flow, so as far as I can tell it is (probably) impossible for an external application to reliably tell the difference between packets belonging to the original (rejected) flow, and packets belonging to the related flow that carries the RST.

It is a shame that a flow ID isn't made available to user applications through the NFLOG / NFQUEUE interface.  Some poking around suggests that a flow ID *might* be available through the NFQA_CT section of the netlink message, so that warrants further investigation maybe.