NIC/driver/channel bonding tests
siegert at sfu.ca
Fri Nov 17 18:51:31 PST 2000
I've tested a bunch of ethernet card/driver combinations. Here are the
results. I hope they may be useful.
Ethernet Card and Driver Test
The following combinations of ethernet cards and drivers were tested under
Linux (RedHat 6.2; kernel 2.2.16):
Realtek RTL8139 / rtl8139.o (version 1.07)
LinkSys LNE100Tx / tulip.o (version 0.91g-ppc)
3Com 905B / 3c59x.o (high-performance variant poll-2.2.17-pre18.c;
3Com 905B / 3c90x.o (version 1.0.0i)
Intel EtherExpressPro 100 / eepro100.o (version 1.09j-t)
3 RTL8139 / rtl8139.o channel bonded
3 3Com 905B / 3c59x.o channel bonded
All tests were done using cross-over cables.
First test: two identically configured PIII/600MHz
tcp test: ./netperf -H p600 -i 10,2 -I 99,5 -l 60
udp test: ./netperf -H p600 -i 10,2 -I 99,5 -l 60 -t UDP_STREAM -- -m 8192 -s 32768 -S 32768
(for the udp test the throughput at the receiving end is reported)
All numbers in Mbit/s.
RTL8139 LNE100Tx 3c59x 3c90x EEPro100 3xRTL8139 3x3c59x
tcp | 85.67 93.72 94.09 94.10 93.38 228.37 279.55
udp | 87.93 95.75 95.82 83.85 95.75 127.47 266.15
Second test: Pentium 166MHz and dual PII/400MHz
tcp test: ./netperf -H p166 -i 10,2 -I 99,5 -l 60
udp test: ./netperf -H p166 -i 10,2 -I 99,5 -l 60 -t UDP_STREAM -- -m 8192 -s 32768 -S 32768
tcp test: ./netperf -H p400 -i 10,2 -I 99,5 -l 60
RTL8139 LNE100Tx 3c59x 3c90x EEPro100
2xPII/400 -> P166, tcp | 62.08 65.24 84.62 54.73 61.23
P166 -> 2xPII/400, tcp | 66.36 68.19 92.91 63.76 85.35
P166 -> 2xPII/400, udp | 88.76 85.33 95.40 58.95 95.76
1. For channel bonding use the module bonding.c from the 2.2.17 kernel even
if your are using a 2.2.16 kernel. Otherwise bad things (kernel oops)
happen. I found out about this the hard way:
"/etc/rc.d/init.d/network restart" will crash your machine so badly that
it won't even reboot (it hangs on unmounting /proc with "device busy").
Only power cycling the box will get it back to life. The problem is
with bonding module: "ifconfig bond0 down; rmmod bonding" will yield
the same result (oops).
The 2.2.17 module seems to fix this.
2. The 3c59x.c from the 2.2.17 kernel produced basically the same results
as the high-performance variant used in the tests above. Somehow these
tests don't seem to use the performance enhancing "polling" modes of
new the driver. Does anybody have more information on this?
The 3c59x.c from kernels < 2.2.17 have bugs and should not be used.
3. The 3Com 3c90x.c driver has given me nothing but problems, particularly
in connection with NFS. Under heavy NFS load the interface would simply
lock up and only "/etc/rc.d/init.d/network restart" would solve the
problem. As the tests above show the performance of the 3c90x driver
is much worse than that of the 3c59x driver.
4. Channel bonding with the 3c59x driver works without any problems: just
create /etc/sysconfig/network-scripts/ifcfg-bond0 and setup the
/etc/sysconfig/network-scripts/ifcfg-eth1, etc. files. That's all
what is required. With the RealTek cards/driver this fails, because
the MAC addresses are not copied correctly. The only way out is to
actually change the real MAC addresses of the ethernet cards by making
them all equal. Laurent Itti described on this list how to do this.
Channel bonding with the 3c90x driver fails for the similar reasons
(in that case not even ifconfig reports identical MAC addresses).
Also with the RealTek cards the connection would occasionally die
and I had to run "/etc/rc.d/init.d/network restart". This did not
happen with the 3C905B/3c59x cards/driver.
I only had two tulip and Intel cards, so I could not test channel
bonding with those cards.
1. The RealTek cards were far the worst in this test.
Heavy udp load freezes the machine. The channel bonded 3 Realtek cards
sometimes locked up the machine so severely that only a hard reboot
(on/off switch) would bring it back. They are cheap, but it seems that
you get what you paid for.
2. The results for the tulip, 3Com, Intel cards for PIII/600 -> PIII/600
do not differ significantly. However they differ in the tests from
the P166 to the dual PII/400. The significance of this test is the
following: the P166 does not have the cpu power to handle 100Mbit/s.
Hence, the transfer rates in this case are not limited by the highest
throughput a particular card/driver combination can handle, but by
the cpu. However, some of the cards are "smarter" than others as they
can off load some of the cpu tasks. Therefore, a higher throughput in
this test indicates a "smarter" card. This should be particularly
important when you channel bond the cards. If the cpu is 100% busy
with maintaining a high throughput, it is impossible to do computation
and communication in parallel. In this area the 3C905B/3c59x outperforms
all the other card/driver combinations.
3. Channel bonding three 3C905B using the 3c59x driver works very well: the
throughput is basically three times the throughput of a single card.
Channel bonding the RealTek cards is out of the question with the
RealTek cards: they show a horrendous packet loss under udp. This
would bring NFS to a grinding halt. Also the reliability is poor
(the uptime of our beowulf is currently 150 days; and that's only
because I had to upgrade the kernel: cf. Linux capabilities bug).
I'm planning to upgrade my cluster to 3 x 100BaseT using 3C905B and the
3c59x driver :-)
[This was the main purpose of this exercise].
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
More information about the Beowulf