Channel bonding: working combinations ?
Pfenniger Daniel
daniel.pfenniger at obs.unige.ch
Sun Jan 21 23:58:55 PST 2001
Hi!
I am trying to install channel bonding on our cluster, but I meet a
few problems that may interest people on the list.
Linux kernel: 2.2.18 or 2.4.0, compiled with gcc 2.95.2, (RedHat 6.2)
Motherboard: ASUS P2B-D (BX chipset)
Procs: Pentium II 400 dual
Ethernet cards: with the tulip chips DS21140 and DS21143. They work well
when not bonded.
Switches: 2 Foundry FastIron II
Drivers: tulip.o, or old_tulip.o as modules supplied with the official kernel
Documentation: in /usr/src/linux-2.2.18/Documentation/networking/bonding.txt
(BTW this file is not provided in kernel 2.4.0)
I have strictly followed the indications in bonding.txt
Every card has a distinct IRQ.
The first problem is that ifconfig bond0 does not find any hardware
or IP address at boot or interactively (they are zero).
I can persuade an hw address by giving it manually:
ifconfig bond0 192.168.2.64 hw ether 00:40:05:A1:D9:09 up
Here I don't know how to automatically force the hw address in the
ifcfg-bond0 file.
Incidentally there are a few different versions of ifenslave.c on the net
with the same version number (v0.07 9/9/97 Donald Becker
(becker at cesdis.gsfc.nasa.gov)).
I have taken the version included with the bonding-0.2.tar.gz tarball.
By manually starting channel bonding I get (eth0 is assigned to another
network):
bond0 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:108 errors:38 dropped:0 overruns:0 frame:0
TX packets:6 errors:5 dropped:0 overruns:0 carrier:15
collisions:0 txqueuelen:0
eth1 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:108 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:18 Base address:0xb800
eth2 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:0 errors:38 dropped:0 overruns:0 frame:0
TX packets:0 errors:5 dropped:0 overruns:0 carrier:15
collisions:0 txqueuelen:100
Interrupt:17 Base address:0xb400
Then a ping to another such bonded node may produce different things:
- a complete freeze, reset required.
- ping waits, ctrl-c stops it.
- ping works, with almost double speed
When ping works netperf -H node may either be almost twice as fast (175 Mb/s)
as single channel communications (94 Mb/s), or much slower (10, 25 Mb/s),
despite ping indicating improved communication time.
In conclusion channel bonding with such a configuration appears unreliable.
Since several messages have been posted on this list stating problems,
as well as on the tulip list about tulip drivers, with the present channel
bonding capability of the Linux kernel, it could be useful if people with
working combinations of kernel (is 2.2.17 better), NIC/driver (which tulip
version), etc, could share their detailed working specs.
I am sure this would be much appreciated by those wanting to bond their Beowulf.
Daniel Pfenniger
More information about the Beowulf
mailing list