problems with etherchannel and NatSemi DP83815 cards

Anders Lennartsson anders.lennartsson at foi.se
Wed Mar 7 03:56:03 PST 2001


Hi

BACKGROUND:

I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes,
each a PPro 200 :( but there may be more/other stuff coming :).
Considering the costs, we settled for Netgear 311 ethernet cards, for
which there is support in 2.4.x kernels. Patches are available for
kernels 2.2.x,
but since 2.4 is here... 
I have checked and the driver is a slightly modified version derived
from natsemi.c
available on www.scyld.com. There are some additions in the later not
included in the
one provided in the kernel source though.

Initially I put one card in each machine and verified that everything
worked.
I tested with NTtcp (netperf derivative?) and the the throughput
asymptotically
went up to about 90Mbits per second when two cards were connected
through a 100Mbps
switch (where are the last 10?).

Then I set out for etherchannel bonding.
It was a bit tricky to find a working ifenslave.c,
the one on www.beowulf.org seemed old and I found a newer at
pdsf.nersc.gov/linux/
Then it seemed to work after doing:

ifconfig bond0 192.168.1.x netmask 255.255.255.0 up
./ifenslave bond0 eth0
(bond0 gets the MAC adress from eth0)
./ifenslave bond0 eth1 

When testing the setup by ftping a large file between two nodes
messages of the following type was output repeatedly on the console:

ethX ... Something wicked happened! 0YYY
where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see
below.

Same thing happened when running NPtcp as package size came above a few
kbytes, speeds approx 50MBits per second.

QUESTIONS:

Anyone got ideas as to the nature/solution of this problem?
I suppose the PCI interface on these particular motherboards may play a
significant
role. Maybe the driver itself? Or is just the processor too slow?

Does anyone have experience of this with for instance 3c905?
Otherwise a very stable card IMHO.
It is about three times more expensive which isn't that much for
one or two, although I could imagine substantial savings
for a large cluster. But if my hours are included ...

Regards,
Anders

SOME DETAILED INFO:

>From syslog, kernel identifying network cards: (eth2 is for accessing from
outside the dedicated networks)

Mar  1 21:30:53 beo101 kernel:  
http://www.scyld.com/network/natsemi.html
Mar  1 21:30:53 beo101 kernel:   (unofficial 2.4.x kernel port, version
1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)
Mar  1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000,
00:02:e3:03:da:87, IRQ 12.
Mar  1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000,
00:02:e3:03:de:43, IRQ 10.
Mar  1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000,
00:02:e3:03:dc:2c, IRQ 11.
Mar  1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869
advertising 05e1.

some lines of the wicked message: (above those are the two lines where
eth0 and eth1 are reported when ifenslave is run)

Mar  1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok)
Mar  1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on
negotiated link capability.
Mar  1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s
Mar  1 21:35:32 beo101 ntpd[182]: kernel pll status change 41
Mar  1 21:35:32 beo101 ntpd[182]: synchronisation lost
Mar  1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on
negotiated link capability.
Mar  1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD (  if [ -x
/usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null
2>&1; fi)
Mar  1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:12 beo101 last message repeated 2 times
Mar  1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:13 beo101 last message repeated 2 times
Mar  1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 last message repeated 3 times
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500.

The result of ifconfig:

bond0     Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:986886789 (941.1 Mb)

eth0      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907798 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915439 errors:1776 dropped:0 overruns:1776
carrier:1776
          collisions:0 txqueuelen:100 
          RX bytes:435552233 (415.3 Mb)  TX bytes:491795214 (469.0 Mb)
          Interrupt:12 

eth1      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907768 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915466 errors:1748 dropped:0 overruns:1748
carrier:1748
          collisions:0 txqueuelen:100 
          RX bytes:434992308 (414.8 Mb)  TX bytes:489766183 (467.0 Mb)
          Interrupt:10 Base address:0x2000 

eth2      Link encap:Ethernet  HWaddr 00:02:E3:03:DC:2C  
          inet addr:150.227.64.210  Bcast:150.227.64.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:13122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1032660 (1008.4 Kb)  TX bytes:943713 (921.5 Kb)
          Interrupt:11 Base address:0x4000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3904  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:552 (552.0 b)  TX bytes:552 (552.0 b)




More information about the Beowulf mailing list