channel bonding weirdness

Fri Nov 17 15:07:19 PST 2000

Hi

We are experimenting with channel bonding in our cluster. We are using 8
dual PIII600s, Tulip based NICs, Cisco 34xx switch, bonding-0.2.tar, and
2.2.14. 

Channel bonding seems to work, but with some performance problems and
some oddities. 

One issue is that the machine will not boot into a bonded configuration.
Eth0 gets 0...0 as a MAC address. Eth1 boots OK. Putting 'ifenslave
bond0 eth0' in /etc/rc.d/init.d/network fixes this and the machine will
boot into a bonded configuration. Running netperf immediately after
booting shows we get about 107 Mb/s. The previous unbonded configuration
showed 94 Mb/s.

What happens next is most perplexing:

If we run 'ifenslave bond0 eth1', netperf jumps to 137mb/s. And for each
time we run:

ifenslave bond0 eth0
ifenslave bond0 eth1

we get a slight increase in performance. In fact if we run these two
statements a multiple of times (on the order of ~100 times), netperf
jumps to about 180 Mb/s where it reaches a plateau.

At this point running '/etc/rc.d/init.d/network restart' does not change
the performance, but rebooting drops it back to 107.

Another issue is that we are getting an intermittent kernel oops when we
kill the bond0, eth0, or eth1 devices. I think this may relate to the
memory leak that was mentioned in the list some time ago.

Does anybody have any insight into the weird performance problems that
we are having?

thanks

Jon