problems with etherchannel and NatSemi DP83815 cards
Anders Lennartsson
anders.lennartsson at foi.se
Tue Mar 6 07:53:49 PST 2001
Hi
BACKGROUND:
I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes,
each a PPro 200 :( but there may be more/other stuff coming :).
Considering the costs, we settled for Netgear 311 ethernet cards, for
which there is support in 2.4.x kernels. Patches are available for
kernels 2.2.x,
but since 2.4 is here...
I have checked and the driver is a slightly modified version derived
from natsemi.c
available on www.scyld.com. There are some additions in the later not
included in the
one provided in the kernel source though.
Initially I put one card in each machine and verified that everything
worked.
I tested with NTtcp (netperf derivative?) and the the throughput
asymptotically
went up to about 90Mbits per second when two cards were connected
through a 100Mbps
switch (where are the last 10?).
Then I set out for etherchannel bonding.
It was a bit tricky to find a working ifenslave.c,
the one on www.beowulf.org seemed old and I found a newer at
pdsf.nersc.gov/linux/
Then it seemed to work after doing:
ifconfig bond0 192.168.1.x netmask 255.255.255.0 up
./ifenslave bond0 eth0
(bond0 gets the MAC adress from eth0)
./ifenslave bond0 eth1
When testing the setup by ftping a large file between two nodes
messages of the following type was output repeatedly on the console:
ethX ... Something wicked happened! 0YYY
where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see
below.
Same thing happened when running NPtcp as package size came above a few
kbytes, speeds approx 50MBits per second.
QUESTIONS:
Anyone got ideas as to the nature/solution of this problem?
I suppose the PCI interface on these particular motherboards may play a
significant
role. Maybe the driver itself? Or is just the processor too slow?
Does anyone have experience of this with for instance 3c905?
Otherwise a very stable card IMHO.
It is about three times more expensive which isn't that much for
one or two, although I could imagine substantial savings
for a large cluster. But if my hours are included ...
Regards,
Anders
SOME DETAILED INFO:
>From syslog, kernel identifying network cards: (eth2 is for accessing from
outside the dedicated networks)
Mar 1 21:30:53 beo101 kernel:
http://www.scyld.com/network/natsemi.html
Mar 1 21:30:53 beo101 kernel: (unofficial 2.4.x kernel port, version
1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)
Mar 1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000,
00:02:e3:03:da:87, IRQ 12.
Mar 1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869
advertising 05e1.
Mar 1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000,
00:02:e3:03:de:43, IRQ 10.
Mar 1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869
advertising 05e1.
Mar 1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000,
00:02:e3:03:dc:2c, IRQ 11.
Mar 1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869
advertising 05e1.
some lines of the wicked message: (above those are the two lines where
eth0 and eth1 are reported when ifenslave is run)
Mar 1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok)
Mar 1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on
negotiated link capability.
Mar 1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s
Mar 1 21:35:32 beo101 ntpd[182]: kernel pll status change 41
Mar 1 21:35:32 beo101 ntpd[182]: synchronisation lost
Mar 1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on
negotiated link capability.
Mar 1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD ( if [ -x
/usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null
2>&1; fi)
Mar 1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:12 beo101 last message repeated 2 times
Mar 1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:13 beo101 last message repeated 2 times
Mar 1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 last message repeated 3 times
Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500.
Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500.
Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500.
The result of ifconfig:
bond0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87
inet addr:192.168.1.101 Bcast:192.168.1.255
Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:986886789 (941.1 Mb)
eth0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87
inet addr:192.168.1.101 Bcast:192.168.1.255
Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:907798 errors:0 dropped:0 overruns:0 frame:0
TX packets:915439 errors:1776 dropped:0 overruns:1776
carrier:1776
collisions:0 txqueuelen:100
RX bytes:435552233 (415.3 Mb) TX bytes:491795214 (469.0 Mb)
Interrupt:12
eth1 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87
inet addr:192.168.1.101 Bcast:192.168.1.255
Mask:255.255.255.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:907768 errors:0 dropped:0 overruns:0 frame:0
TX packets:915466 errors:1748 dropped:0 overruns:1748
carrier:1748
collisions:0 txqueuelen:100
RX bytes:434992308 (414.8 Mb) TX bytes:489766183 (467.0 Mb)
Interrupt:10 Base address:0x2000
eth2 Link encap:Ethernet HWaddr 00:02:E3:03:DC:2C
inet addr:150.227.64.210 Bcast:150.227.64.255
Mask:255.255.255.0
UP BROADCAST RUNNING MTU:1500 Metric:1
RX packets:13122 errors:0 dropped:0 overruns:0 frame:0
TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:1032660 (1008.4 Kb) TX bytes:943713 (921.5 Kb)
Interrupt:11 Base address:0x4000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3904 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:552 (552.0 b) TX bytes:552 (552.0 b)
More information about the Beowulf
mailing list