3x100Mbps: summary
Laurent Itti
itti at cco.caltech.edu
Sat Oct 7 19:54:57 PDT 2000
Hi all -
thanks a lot for very good advice and support. I enclose a brief summary
of our experience with triple channel bonding. As a last resort I read
the doc for the motherboards and realized that the interrupts were shared
among several devices because of the physical distribution of the boards
in the PCI slots. Moving one board from one slot to another solved the
problem and increased performance!
3xRTL8139 channel bonding:
--------------------------
- setup:
* install 3 Realtek RTL8139 10/100BTX boards in each node
* you will need 3 separate switches for bonding, as it will
require that the 3 boards in each node have the same MAC
address (or you can use one switch which can do trunking and
that would not be confused by seeing 3 times the same MAC
address for each node)
* connect all boards No. 1 to the first switch, all boards No. 2
to second switch, all boards No. 3 to third switch
* check IRQ assignment; you will gain some performance by having
each board on a separate IRQ. If two boards share an IRQ,
it may be because they are physically placed in PCI slots
that share IRQs. See your motherboard manual and try
shuffling your 3 boards in the available PCI slots.
Putting 1 board in each of PCI slots 1, 2, and 3 worked for us.
- bonding:
* enslaving the cards does not seem to succeed in setting the MAC
address on the RTL8139 boards (though ifconfig reports all
MAC addresses as equal). You can however use Donald
Becker's "rtl8139-diag" program to permanently set them to be
equal (see http://www.scyld.com/network/rtl8139.html).
NOTE: the code seems to be missing a few lines that
would actually write the new address to the eeprom; adding
the following worked for us:
after: [line 569]
/* The user will usually want to see the interpreted EEPROM contents. */
if (show_eeprom)
parse_eeprom(eeprom_contents);
if (show_eeprom > 1) {
...
}
add:
if (set_hwaddr) {
printf("\n ****** OK, setting the HWADDR! *****\n");
do_update(ioaddr, eeprom_contents, 7, "hw1",
((int)(new_hwaddr[1]))*256+((int)(new_hwaddr[0])));
do_update(ioaddr, eeprom_contents, 8, "hw2",
((int)(new_hwaddr[3]))*256+((int)(new_hwaddr[2])));
do_update(ioaddr, eeprom_contents, 9, "hw3",
((int)(new_hwaddr[5]))*256+((int)(new_hwaddr[4])));
printf(" ****** New HWADDR set! *****\n");
}
-----------------
then run "rtl8139-diag -ee -H 10:20:30:00:00:XX -w", for example
with XX your node number (you can choose whatever for the first 5
numbers). CAUTION: that will be the new MAC address on all RTL8139
boards in your machine. Use -p <ioaddr> to set just one. You have
to run this TWICE for it to really work. Check the new MAC addresses
with "rtl8139-diag -ee".
then the instructions as given in your kernel documentation will work
(/usr/src/linux/Documentation/networking/bonding.txt)
- performance:
* check out netperf.org for a test suite and test results.
* a single board yields around 94 Mbps throughput
* bonded dual-board setups (with another brand for the boards) have
been reported to yield around 187 Mbps throughput
* with 3 boards we start seeing diminishing returns but still
significantly improved performance over 2 boards:
setup: Abit SE6(i815), PIII-733EB, 256MB PC133, 30GB UATA/100, 3xRTL8139
linux: Mandrake 7.2beta3 (sept 22, 2000), 2.2.17mdk kernel,
after experimentation finally it seems that the rtl8139 driver
is more stable than the 8139too driver (both included in
the distribution; edit /etc/modules.conf to choose; we can
still crash the drivers by running the tests below for very
long periods...)
state: normal boot and runlevel, lots of processes running, i.e.,
"real-life" performance test. Node n11 is running "netserver".
[root at n01 /root]# netperf -l 60 -H n11
TCP STREAM TEST to n11
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
65535 65535 65535 60.00 244.06
[root at n01 /root]# netperf -H n11 -t UDP_STREAM -- -m 10240
UDP UNIDIRECTIONAL SEND TEST to n11
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
65535 10240 9.99 32503 0 266.49
65535 9.99 32319 264.98
we can get up to 280 Mbps UDP send (first line) but then reduce the UDP
receive (second line), by playing with message size. I'll have to read
netperf's doc to see what that means ;-)
[root at n01 /root]# cat /proc/interrupts
CPU0
0: 401225 XT-PIC timer
1: 8 XT-PIC keyboard
2: 0 XT-PIC cascade
8: 1 XT-PIC rtc
10: 2172666 XT-PIC eth2
11: 2187063 XT-PIC eth0
13: 1 XT-PIC fpu
14: 7001 XT-PIC ide0
15: 2182471 XT-PIC eth1
NMI: 0
interrupts are nicely balanced among the 3 boards.
- conclusion:
* the setup was not easy, but probably because it was a first time
* we hope this will help future setups
* performance is a bit degraded compared to theoretical, but for the
cost, I think it's a killer! You might improve by using
crossover cables instead of our cheap switches (but that no more
would qualify as a real-life test) and by tuning your kernel.
* cost per node: 3 * ( $7.60 [RTL8139 board]
+ $0.62 [3ft Cat5e 350MHz certified cable]
+ $10.83 [1 port in a $260 24-port NWAY switch]
) = $57.15 [all prices from www.kristamicro.com,
------ where we bought the parts]
* many 10/100BTX ethernet boards sell for more (and you still have to buy
cables and 1 switch), so overall our experience with bonding is
a great success. Thanks to Donald Becker for adding this capability
to Linux and to all who helped when we had problems setting up!
More information about the Beowulf
mailing list