{{tag>linux router mtu netfilter }} =====Basic Netfilter Function Block Diagram===== Both NFTables and IPTables use the [[https://en.wikipedia.org/wiki/Netfilter|Netfilter]] framework provided in the Linux kernal. NFtables was implemented to supersede IPTables, which due to the widespread use of IPTables, will probably take a long time.\\ {{:linux_router:linux_netfilter.png?600|}} The following is a basic block diagram of the Netfilter Filter and NAT (Network Address Translation) functions, which are the basic requirements for router. Incoming Packets | ┌────────────┐ │ Prerouting │ │ Rules │ └────────────┘ | /----------\ | Routing | NAT | Decision |-----------------| | Rules | Filter | \----------/ | | | |------------| | | Input | | | Rules | | |------------| | | | |-------------------| |----------| | Network Processes | | Forward | | within Router | | Rules | |-------------------| |----------| | | |------------| | | Output | | | Rules | | |------------| | | FILTER | | |------------------| | | NAT |-------------| | Postrouting | | Rules | |-------------| | Outgoing Packets =====IPTables and Netfilter===== The following is taken from Digitalocean [[https://www.digitalocean.com/community/tutorials/a-deep-dive-into-iptables-and-netfilter-architecture|A Deep Dive into Iptables and Netfilter Architecture]]. While it it is focus on iptables the concepts are basically valid for nftables. ++++ tldr| ====IPTables Tables and Chains==== The ''iptables'' firewall uses tables to organize its rules. These tables classify rules according to the type of decisions they are used to make. For instance, if a rule deals with network address translation, it will be put into the nat table. If the rule is used to decide whether to allow the packet to continue to its destination, it would probably be added to the ''filter'' table. Within each ''iptables'' table, rules are further organized within separate “chains”. While tables are defined by the general aim of the rules they hold, the built-in chains represent the ''netfilter'' hooks which trigger them. Chains determine when rules will be evaluated. The names of the built-in chains mirror the names of the ''netfilter'' hooks they are associated with: *''PREROUTING'': Triggered by the NF_IP_PRE_ROUTING hook. *''INPUT'': Triggered by the NF_IP_LOCAL_IN hook. *''FORWARD'': Triggered by the NF_IP_FORWARD hook. *''OUTPUT'': Triggered by the NF_IP_LOCAL_OUT hook. *''POSTROUTING'': Triggered by the NF_IP_POST_ROUTING hook. Chains allow the administrator to control where in a packet’s delivery path a rule will be evaluated. Since each table has multiple chains, a table’s influence can be exerted at multiple points in processing. Because certain types of decisions only make sense at certain points in the network stack, every table will not have a chain registered with each kernel hook. There are only five ''netfilter'' kernel hooks, so chains from multiple tables are registered at each of the hooks. For instance, three tables have ''PREROUTING'' chains. When these chains register at the associated ''NF_IP_PRE_ROUTING'' hook, they specify a priority that dictates what order each table’s ''PREROUTING'' chain is called. Each of the rules inside the highest priority ''PREROUTING'' chain is evaluated sequentially before moving onto the next ''PREROUTING'' chain. We will take a look at the specific order of each chain in a moment. ====Which Tables are Available?==== Let’s step back for a moment and take a look at the different tables that ''iptables'' provides. These represent distinct sets of rules, organized by area of concern, for evaluating packets. ===The Filter Table=== The filter table is one of the most widely used tables in ''iptables''. The ''filter'' table is used to make decisions about whether to let a packet continue to its intended destination or to deny its request. In firewall parlance, this is known as “filtering” packets. This table provides the bulk of functionality that people think of when discussing firewalls. ===The NAT Table=== The ''nat'' table is used to implement network address translation rules. As packets enter the network stack, rules in this table will determine whether and how to modify the packet’s source or destination addresses in order to impact the way that the packet and any response traffic are routed. This is often used to route packets to networks when direct access is not possible. ===The Mangle Table=== The ''mangle'' table is used to alter the IP headers of the packet in various ways. For instance, you can adjust the TTL (Time to Live) value of a packet, either lengthening or shortening the number of valid network hops the packet can sustain. Other IP headers can be altered in similar ways. This table can also place an internal kernel “mark” on the packet for further processing in other tables and by other networking tools. This mark does not touch the actual packet, but adds the mark to the kernel’s representation of the packet. ===The Raw Table=== The ''iptables'' firewall is stateful, meaning that packets are evaluated in regards to their relation to previous packets. The connection tracking features built on top of the ''netfilter'' framework allow ''iptables'' to view packets as part of an ongoing connection or session instead of as a stream of discrete, unrelated packets. The connection tracking logic is usually applied very soon after the packet hits the network interface. The ''raw'' table has a very narrowly defined function. Its only purpose is to provide a mechanism for marking packets in order to opt-out of connection tracking. ===The Security Table=== The ''security'' table is used to set internal SELinux security context marks on packets, which will affect how SELinux or other systems that can interpret SELinux security contexts handle the packets. These marks can be applied on a per-packet or per-connection basis. ====Relationships Between Chains and Tables==== If three tables have ''PREROUTING'' chains, in which order are they evaluated? The following table indicates the chains that are available within each ''iptables'' table when read from left-to-right. For instance, we can tell that the ''raw'' table has both ''PREROUTING'' and ''OUTPUT'' chains. When read from top-to-bottom, it also displays the order in which each chain is called when the associated ''netfilter'' hook is triggered. A few things should be noted. In the representation below, the ''nat'' table has been split between ''DNAT'' operations (those that alter the destination address of a packet) and ''SNAT'' operations (those that alter the source address) in order to display their ordering more clearly. We have also include rows that represent points where routing decisions are made and where connection tracking is enabled in order to give a more holistic view of the processes taking place: |<55em 15em 8em 8em 8em 8em>| ^Tables/Chains ^ PREROUTING ^ INPUT ^ FORWARD ^ OUTPUT ^ POSTROUTING ^ |(routing decision) | | | | ✓ | | |**raw** | ✓ | | | ✓ | | |(connection tracking enabled) | ✓ | | | ✓ | | |**mangle** | ✓ | ✓ | ✓ | ✓ | ✓ | |**nat** (DNAT) | ✓ | | | ✓ | |(routing decision) | ✓ | | | ✓ | |**filter** | | ✓ | ✓ | ✓ | |**security** | | ✓ | ✓ | ✓ | |**nat** (SNAT) | | ✓ | | | ✓ | As a packet triggers a **netfilter** hook, the associated chains will be processed as they are listed in the table above from top-to-bottom. The hooks (columns) that a packet will trigger depend on whether it is an incoming or outgoing packet, the routing decisions that are made, and whether the packet passes filtering criteria. Certain events will cause a table’s chain to be skipped during processing. For instance, only the first packet in a connection will be evaluated against the NAT rules. Any **nat** decisions made for the first packet will be applied to all subsequent packets in the connection without additional evaluation. Responses to NAT’ed connections will automatically have the reverse NAT rules applied to route correctly. Chain Traversal Order Assuming that the server knows how to route a packet and that the firewall rules permit its transmission, the following flows represent the paths that will be traversed in different situations: *Incoming packets destined for the local system: **PREROUTING -> INPUT** *Incoming packets destined to another host: **PREROUTING -> FORWARD -> POSTROUTING** *Locally generated packets: **OUTPUT -> POSTROUTING** If we combine the above information with the ordering laid out in the previous table, we can see that an incoming packet destined for the local system will first be evaluated against the **PREROUTING** chains of the **raw**, **mangle**, and **nat** tables. It will then traverse the **INPUT** chains of the **mangle**, **filter**, **security**, and **nat** tables before finally being delivered to the local socket. ==== IPTables and Connection Tracking==== We introduced the connection tracking system implemented on top of the ''netfilter'' framework when we discussed the ''raw'' table and connection state matching criteria. Connection tracking allows ''iptables'' to make decisions about packets viewed in the context of an ongoing connection. The connection tracking system provides ''iptables'' with the functionality it needs to perform “stateful” operations. Connection tracking is applied very soon after packets enter the networking stack. The ''raw'' table chains and some sanity checks are the only logic that is performed on packets prior to associating the packets with a connection. The system checks each packet against a set of existing connections. It will update the state of the connection in its store if needed and will add new connections to the system when necessary. Packets that have been marked with the ''NOTRACK'' target in one of the ''raw'' chains will bypass the connection tracking routines. ===Available States=== Connections tracked by the connection tracking system will be in one of the following states: *''NEW'': When a packet arrives that is not associated with an existing connection, but is not invalid as a first packet, a new connection will be added to the system with this label. This happens for both connection-aware protocols like TCP and for connectionless protocols like UDP. *''ESTABLISHED'': A connection is changed from ''NEW'' to ''ESTABLISHED'' when it receives a valid response in the opposite direction. For TCP connections, this means a ''SYN/ACK'' and for UDP and ICMP traffic, this means a response where source and destination of the original packet are switched. *''RELATED'': Packets that are not part of an existing connection, but are associated with a connection already in the system are labeled ''RELATED''. This could mean a helper connection, as is the case with FTP data transmission connections, or it could be ICMP responses to connection attempts by other protocols. *''INVALID'': Packets can be marked ''INVALID'' if they are not associated with an existing connection and aren’t appropriate for opening a new connection, if they cannot be identified, or if they aren’t routable among other reasons. *''UNTRACKED'': Packets can be marked as ''UNTRACKED'' if they’ve been targeted in a ''raw'' table chain to bypass tracking. *''SNAT'': This is a virtual state set when the source address has been altered by NAT operations. This is used by the connection tracking system so that it knows to change the source addresses back in reply packets. *''DNAT'': This is a virtual state set when the destination address has been altered by NAT operations. This is used by the connection tracking system so that it knows to change the destination address back when routing reply packets. The states tracked in the connection tracking system allow administrators to craft rules that target specific points in a connection’s lifetime. This provides the functionality needed for more thorough and secure rules. ++++ ====Some references==== *Netfilter.org iptables how to [[https://www.netfilter.org/documentation/HOWTO/NAT-HOWTO-6.html|Saying how to mangle the packets]] *[[https://linux.die.net/man/8/iptables|iptables(8) - Linux man page]] *[[http://www.oocities.org/youssef116/writing/ratelim.html|The iptables Rate-Limiting Module]] *Nixcraft [[https://www.cyberciti.biz/tips/linux-iptables-9-allow-icmp-ping.html|IPTables allow or block ICMP ping request]] *[[http://www.microhowto.info/howto/limit_the_rate_of_inbound_tcp_connections_using_iptables.html|Limit the rate of inbound TCP connections using iptables]] *[[https://thelowedown.wordpress.com/2008/07/03/iptables-how-to-use-the-limits-module/|iptables: How to use the limits module]] *[[https://debian-administration.org/article/187/Using_iptables_to_rate-limit_incoming_connections|Using iptables to rate-limit incoming connections]] *The Geek Stuff: *[[https://www.thegeekstuff.com/2011/01/iptables-fundamentals/|Linux Firewall Tutorial: IPTables Tables, Chains, Rules Fundamentals]] *[[https://www.thegeekstuff.com/2011/06/iptables-rules-examples/#comments|25 Most Frequently Used Linux IPTables Rules Examples]] *[[https://www.thegeekstuff.com/2010/07/fail2ban-howto/|Fail2Ban Howto: Block IP Address Using Fail2ban and IPTables]] *[[https://www.thegeekstuff.com/scripts/iptables-rules|iptables script]] *[[http://www.epicvoyage.org/blog/geek-stuffiptables-spammers-are-annoying-right|Geek Stuff/iptables: Spammers are Annoying, Right?]] *Cisco [[http://www.ciscopress.com/articles/article.asp?p=174313&seqnum=4|General Design Considerations for Secure Networks]] *Oregon Tech [[http://oregontechsupport.com/articles/icmp.txt|icmp.txt]] *Digitalocean [[https://www.digitalocean.com/community/tutorials/a-deep-dive-into-iptables-and-netfilter-architecture|A Deep Dive into Iptables and Netfilter Architecture]] ====PPPoE MTU Requirements==== The PPPoE connection have various additional overhead to that in a standard Ethernet data field. The maximum length (MTU) of the data field of a standard Ethernet data field is limited 1500 bytes. A standard PPPoE connection has an additional overhead of 8 bytes, which limits the MTU to 1492 bytes. However, some ISP (internet service providers) may have additional overheads. To determine the the largest MTU use the ping command. The ping command has a 28 bytes overhead (20 bytes IP header + 8 bytes for ICMP header). So the MTU is the greatest value that can be pinged without a fragmentation error, plus 28 bytes for the ping overhead. For a normal PPPoE connection this would be 1492 - 28 = 1464 bytes. (Note that a problem with this method is that it probably uses an existing modem router that sets the MTU, and it is possible that this setting acts as the limiter.) Some command examples: *''ping -s 1464 -c1 google.com'' *''tracepath vorash.stgraber.org'' See references: [[https://ubuntuforums.org/showthread.php?t=872346|How to Optimize your Internet Connection using MTU and RWIN]], [[https://samuel.kadolph.com/2015/02/mtu-and-tcp-mss-when-using-pppoe-2/|MTU and TCP MSS when using PPPoE]], [[https://www.lifewire.com/tcp-headers-and-udp-headers-explained-817970|TCP Headers and UDP Headers Explained]], [[http://www.znep.com/~marcs/mtu/|Path MTU Discovery and Filtering ICMP]] **Cisco** [[https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html|Resolve IP Fragmentation]], [[https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html|MTU]], [[https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html|MSS]], and [[https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html|PMTUD Issues with GRE and IPSEC]], [[https://supportforums.cisco.com/t5/wan-routing-and-switching/understanding-mtu-for-adsl/td-p/2363074|Understanding MTU for ADSL]], and **Wikipedia** [[https://en.wikipedia.org/wiki/IPv4#IHL|IPv4]], [[https://en.wikipedia.org/wiki/EtherType|Ethertype]], [[https://en.wikipedia.org/wiki/IEEE_802.1Q|IEEE 802.1Q]], [[https://en.wikipedia.org/wiki/Maximum_transmission_unit|Maximum transmission unit]], [[https://en.wikipedia.org/wiki/Point-to-point_protocol_over_Ethernet|Point-to-point protocol over Ethernet]], [[https://en.wikipedia.org/wiki/IPv6_packet|IPv6 packet]], [[https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol_version_6|Internet Control Message Protocol version 6]], and [[https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol_version_6|Path MTU Discovery]]. The MSS is normally just 40 bytes less than the MTU. The MSS is used to avoid IP fragmentation at endpoints of TCP connections. The MSS is just the TCP data size and excludes the IP and TCP headers that are normally 20 bytes each. So normal mss would be 1492 - 40 = 1452 bytes Some Ethernet data field overheads to consider: *PPPoE header = 8 bytes *IP header = 20 bytes, but can grow up to 60 bytes with options that are rarely used. *ICMP header = 8 bytes *TCP header = 20 bytes, but like IP can grow to 60 bytes long The Ethernet datafield (MTU) is limited to 1500 bytes and the maximum Ethernet frame size must be 1536 bytes or greater. The following overheads in the Ethernet frame, over the MTU are given for information: *Preamble = 8 bytes *Destination MAC = 6 bytes *Source MAC = 6 bytes *VLAN header (optional) = 4 btyes *EtherType/Size = 2 bytes *Payload = maximum 1500 bytes (MTU) *CRC/FCS = 4 bytes *As can be seen above the Ethernet frame overhead is normally a minimum of 26 bytes and 30 bytes with VLAN (IEEE 802.1Q) tagging. To set the PPPoE connection mtu edit the following file ''sudo vim /etc/ppp/ip-up'' and append the following to the end of the file: ''/sbin/ifconfig ppp0 mtu 1492''. ====ICMP Filtering==== There seems to be a lot of conflicting information on filtering ICMP, too much!. ICMP is a fundamental component of IP protocal suite and simply blocking it in entirety is poor practice. In fact IPv6 will not function correctly without ICMP. Some judicious filtering and rate limiting seems the correct solution. The following is some reading on ICMP: *[[http://oregontechsupport.com/articles/icmp.txt|Advanced ICMP Filtering with iptables]] *[[https://serverfault.com/questions/84963/why-not-block-icmp|Why not block ICMP?]] *[[https://security.stackexchange.com/questions/22711/is-it-a-bad-idea-for-a-firewall-to-block-icmp/22713#22713|Is it a bad idea for a firewall to block ICMP?]] *[[https://datatracker.ietf.org/doc/draft-ietf-opsec-icmp-filtering/|Recommendations for filtering ICMP messages]] *[[https://ubuntuforums.org/showthread.php?t=2353951|Thread: iptables ICMP types]] *[[http://www.networksorcery.com/enp/protocol/icmp/msg3.htm|ICMP type 3, Destination unreachable message]] *[[https://community.ubnt.com/t5/EdgeMAX/Recommendations-for-filtering-ICMP-messages/td-p/560143|Recommendations for filtering ICMP messages]] *[[https://arstechnica.com/civis/viewtopic.php?t=1199159|ICMP and Traceroute best practices]] *[[https://tools.ietf.org/html/draft-ietf-opsec-icmp-filtering-04|Recommendations for filtering ICMP messages]] <- linux_router:ipoe|Prev page ^ linux_router:start|Start page ^ linux_router:nftables|Next page ->