Howto: Enable Receive Packet Steering (RPS) on Linux 2.6.35

We run a high bandwidth Tor exit node on a Gbit connection. Unfortunately, the NIC by our hoster doesn’t support MSI-X to distribute interrupt load across all cores. The latest linux kernel 2.6.35 adds a mechanism called Receive Packet Steering:

[quote]This patch implements software receive side packet steering (RPS). RPS distributes the load of received packet processing across multiple CPUs.
Problem statement: Protocol processing done in the NAPI context for received packets is serialized per device queue and becomes a bottleneck under high packet load. This substantially limits pps that can be achieved on a single queue NIC and provides no scaling with multiple cores. ([url=] Software receive packet steering[/url])[/quote]
What took us a lot of time to figure out: /proc/interrupts still shows only CPU0 is used for NIC interrupt handling, even with RPS enabled. If you want to find out whether RPS is working, you have to look at /proc/softirqs instead (eg. with watch -n1 cat /proc/softirqs):

                CPU0       CPU1       CPU2       CPU3
HI:          0          0          0          0
TIMER:  480622794  476948579  460999919  467641124
NET_TX:   25311134   27075847   27513332   27307975     <-----
NET_RX: 1388399338 4191697027 1491556667  627387845     <-----
BLOCK:    4632803          3     315726         29
BLOCK_IOPOLL:          0          0          0          0
TASKLET:         21          4          8          2
SCHED:  154913375  158601463   97907175  200790209
HRTIMER:    1576760    2361409    1330088    1545921
RCU:  421549961  407634645  405460584  415147363

In our case, we had to specifically enable RPS:

# cat /sys/class/net/eth0/queues/rx-0/rps_cpus
# echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
# cat /sys/class/net/eth0/queues/rx-0/rps_cpus

I have added the echo line to /etc/interfaces to set it on each boot ("up ...").

More Information

  • [url=] rps: Receive packet steering[/url]
  • [url=]Google Group Redis-DB: better multi-core functionality via per-process shared memory queues[/url]