Keys:

  1. Make sure traffic won’t be interfered with by the iptables

    		 iptables -I INPUT 1 -p udp --dport 4321 -j ACCEPT
    		 iptables -t raw -I PREROUTING 1 -p udp --dport 4321 -j NOTRACK
    	 #+end_src[[id:C471A6FF-7F4E-4E23-B070-14CE146BFA14][Multi-queue NICs]]
    
    2. The first bottleneck
    ​	 + All packets are received by a signal RX queue, checked out with =ethtool -S=.
    ​	 + How to solve: according to [[id:C471A6FF-7F4E-4E23-B070-14CE146BFA14][Multi-queue NICs]], change the hash algorithm with =ethtool=:
    		 #+begin_src bash
    			 ethtool -N eth2 rx-flow-hash udp4 sdfn
    
  2. Multiple threads with NUMA, and with multiple receiver ips to fit in multi-queue hash algorithm. Also note that there is a lock contention on the UDP receive buffer side, see Rivera, Diego, Eduardo Acha, Jose Piquer, and Javier Bustos-Jimenez. “Analysis of Linux UDP Sockets Concurrent Performance.” In 2014 33rd International Conference of the Chilean Computer Science Society (SCCC), 65–69. Talca: IEEE, 2014. https://doi.org/10.1109/SCCC.2014.8.

  3. SO_REUSEPORT to avoid the lock on the UDP receive buffer.

    When this flag is set on a socket descriptor, Linux will allow many processes to bind to the same port. In fact, any number of processes will be allowed to bind and the load will be spread across them.

    With SO_REUSEPORT each of the processes will have a separate socket descriptor. Therefore each will own a dedicated UDP receive buffer. This avoids the contention issues previously encountered: