3x100Mbps: summary

Laurent Itti itti at cco.caltech.edu
Sat Oct 7 19:54:57 PDT 2000

Hi all -

thanks a lot for very good advice and support. I enclose a brief summary
of our experience with triple channel bonding.  As a last resort I read
the doc for the motherboards and realized that the interrupts were shared
among several devices because of the physical distribution of the boards
in the PCI slots.  Moving one board from one slot to another solved the
problem and increased performance!

3xRTL8139 channel bonding:
- setup:
   * install 3 Realtek RTL8139 10/100BTX boards in each node
   * you will need 3 separate switches for bonding, as it will
        require that the 3 boards in each node have the same MAC
        address (or you can use one switch which can do trunking and
        that would not be confused by seeing 3 times the same MAC
        address for each node)
   * connect all boards No. 1 to the first switch, all boards No. 2
        to second switch, all boards No. 3 to third switch
   * check IRQ assignment; you will gain some performance by having
        each board on a separate IRQ. If two boards share an IRQ,
        it may be because they are physically placed in PCI slots
        that share IRQs. See your motherboard manual and try
        shuffling your 3 boards in the available PCI slots.
        Putting 1 board in each of PCI slots 1, 2, and 3 worked for us.

- bonding:
   * enslaving the cards does not seem to succeed in setting the MAC
        address on the RTL8139 boards (though ifconfig reports all
        MAC addresses as equal).  You can however use Donald
        Becker's "rtl8139-diag" program to permanently set them to be
        equal (see http://www.scyld.com/network/rtl8139.html).
        NOTE: the code seems to be missing a few lines that
        would actually write the new address to the eeprom; adding
        the following worked for us:

   after:  [line 569]
	/* The user will usually want to see the interpreted EEPROM contents. */
	if (show_eeprom)
	if (show_eeprom > 1) {
	if (set_hwaddr) {
	  printf("\n ****** OK, setting the HWADDR! *****\n");
	  do_update(ioaddr, eeprom_contents, 7, "hw1",
	  do_update(ioaddr, eeprom_contents, 8, "hw2",
	  do_update(ioaddr, eeprom_contents, 9, "hw3",
	  printf(" ****** New HWADDR set! *****\n");
   then run "rtl8139-diag -ee -H 10:20:30:00:00:XX -w", for example
   with XX your node number (you can choose whatever for the first 5
   numbers). CAUTION: that will be the new MAC address on all RTL8139
   boards in your machine. Use -p <ioaddr> to set just one.  You have
   to run this TWICE for it to really work. Check the new MAC addresses
   with "rtl8139-diag -ee".

   then the instructions as given in your kernel documentation will work

- performance:
  * check out netperf.org for a test suite and test results.
  * a single board yields around 94 Mbps throughput
  * bonded dual-board setups (with another brand for the boards) have
    been reported to yield around 187 Mbps throughput
  * with 3 boards we start seeing diminishing returns but still
    significantly improved performance over 2 boards:

  setup: Abit SE6(i815), PIII-733EB, 256MB PC133, 30GB UATA/100, 3xRTL8139
  linux: Mandrake 7.2beta3 (sept 22, 2000), 2.2.17mdk kernel,
         after experimentation finally it seems that the rtl8139 driver
         is more stable than the 8139too driver (both included in
         the distribution; edit /etc/modules.conf to choose; we can
         still crash the drivers by running the tests below for very
         long periods...)
  state: normal boot and runlevel, lots of processes running, i.e.,
         "real-life" performance test.  Node n11 is running "netserver".

[root at n01 /root]# netperf -l 60 -H n11
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 65535  65535  65535    60.00     244.06

[root at n01 /root]# netperf -H n11 -t UDP_STREAM -- -m 10240
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

 65535   10240   9.99        32503      0     266.49
 65535           9.99        32319            264.98

we can get up to 280 Mbps UDP send (first line) but then reduce the UDP
receive (second line), by playing with message size. I'll have to read
netperf's doc to see what that means ;-)

[root at n01 /root]# cat /proc/interrupts
  0:     401225          XT-PIC  timer
  1:          8          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
 10:    2172666          XT-PIC  eth2
 11:    2187063          XT-PIC  eth0
 13:          1          XT-PIC  fpu
 14:       7001          XT-PIC  ide0
 15:    2182471          XT-PIC  eth1
NMI:          0

interrupts are nicely balanced among the 3 boards.

- conclusion:
  * the setup was not easy, but probably because it was a first time
  * we hope this will help future setups
  * performance is a bit degraded compared to theoretical, but for the
      cost, I think it's a killer!  You might improve by using
      crossover cables instead of our cheap switches (but that no more
      would qualify as a real-life test) and by tuning your kernel.
  * cost per node: 3 * (    $7.60    [RTL8139 board]
                         +  $0.62    [3ft Cat5e 350MHz certified cable]
                         + $10.83    [1 port in a $260 24-port NWAY switch]

                       ) = $57.15    [all prices from www.kristamicro.com,
                           ------     where we bought the parts]

  * many 10/100BTX ethernet boards sell for more (and you still have to buy
    cables and 1 switch), so overall our experience with bonding is
    a great success.  Thanks to Donald Becker for adding this capability
    to Linux and to all who helped when we had problems setting up!

More information about the Beowulf mailing list