Channel bonding: working combinations ?
Martin Siegert
siegert at sfu.ca
Tue Jan 23 12:44:46 PST 2001
Hi Daniel,
I had none of your problems when using the DFE570Tx with the tulip
driver (see the other post). I actually never had to use ifenslave/ifconfig
manually, the configuration comes up reliably after rebooting or when
running "/etc/rc.d/init.d/network restart".
Hence I can only guess where your problems may be:
1. I trust that you have the line "alias bond0 bonding" in your
/etc/conf.modules (or /etc/modules.conf, whatever you are using) file.
(sounds stupid, but I made that mistake once).
2. You mentioned that you use eth0 for a different network. Is it using
the same driver as the other cards? If it is: how do you tell which
card your machine is recognizing as eth0? This happened to me over
and over again: if you plug in a second NIC you cannot be sure that
the new card will be eth1 - it may just as well be eth0 and the old
card may come up as eth1, creating nothing but problems.
The only way I found to figure this out is to run ping on the
network that is connected to eth0 and look which card has flashing
lights (and then swap cards).
I hope this helps.
Cheers,
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
========================================================================
On Mon, Jan 22, 2001 at 08:58:55AM +0100, Pfenniger Daniel wrote:
>
> I am trying to install channel bonding on our cluster, but I meet a
> few problems that may interest people on the list.
>
> Linux kernel: 2.2.18 or 2.4.0, compiled with gcc 2.95.2, (RedHat 6.2)
> Motherboard: ASUS P2B-D (BX chipset)
> Procs: Pentium II 400 dual
> Ethernet cards: with the tulip chips DS21140 and DS21143. They work well
> when not bonded.
> Switches: 2 Foundry FastIron II
> Drivers: tulip.o, or old_tulip.o as modules supplied with the official kernel
> Documentation: in /usr/src/linux-2.2.18/Documentation/networking/bonding.txt
> (BTW this file is not provided in kernel 2.4.0)
>
> I have strictly followed the indications in bonding.txt
> Every card has a distinct IRQ.
>
> The first problem is that ifconfig bond0 does not find any hardware
> or IP address at boot or interactively (they are zero).
> I can persuade an hw address by giving it manually:
>
> ifconfig bond0 192.168.2.64 hw ether 00:40:05:A1:D9:09 up
>
> Here I don't know how to automatically force the hw address in the
> ifcfg-bond0 file.
>
> Incidentally there are a few different versions of ifenslave.c on the net
> with the same version number (v0.07 9/9/97 Donald Becker
> (becker at cesdis.gsfc.nasa.gov)).
> I have taken the version included with the bonding-0.2.tar.gz tarball.
>
> By manually starting channel bonding I get (eth0 is assigned to another
> network):
>
> bond0 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
> inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
> RX packets:108 errors:38 dropped:0 overruns:0 frame:0
> TX packets:6 errors:5 dropped:0 overruns:0 carrier:15
> collisions:0 txqueuelen:0
>
> eth1 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
> inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
> UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
> RX packets:108 errors:0 dropped:0 overruns:0 frame:0
> TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:100
> Interrupt:18 Base address:0xb800
>
> eth2 Link encap:Ethernet HWaddr 00:40:05:A1:D9:09
> inet addr:192.168.2.64 Bcast:192.168.2.255 Mask:255.255.255.0
> UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:38 dropped:0 overruns:0 frame:0
> TX packets:0 errors:5 dropped:0 overruns:0 carrier:15
> collisions:0 txqueuelen:100
> Interrupt:17 Base address:0xb400
>
> Then a ping to another such bonded node may produce different things:
> - a complete freeze, reset required.
> - ping waits, ctrl-c stops it.
> - ping works, with almost double speed
>
> When ping works netperf -H node may either be almost twice as fast (175 Mb/s)
> as single channel communications (94 Mb/s), or much slower (10, 25 Mb/s),
> despite ping indicating improved communication time.
>
> In conclusion channel bonding with such a configuration appears unreliable.
>
> Since several messages have been posted on this list stating problems,
> as well as on the tulip list about tulip drivers, with the present channel
> bonding capability of the Linux kernel, it could be useful if people with
> working combinations of kernel (is 2.2.17 better), NIC/driver (which tulip
> version), etc, could share their detailed working specs.
> I am sure this would be much appreciated by those wanting to bond their Beowulf.
More information about the Beowulf
mailing list