3C905B problem
Martin Siegert
siegert@sfu.ca
Tue Apr 18 17:15:53 2000
Hi there,
I am running a small beowulf cluster (8 dual processor PIII-500MHz)
using RedHat 6.1, but with the kernel upgraded to 2.2.14 (SMP).
The master node has three ethernet cards, all 3com 3c905B's.
(eth0 for the outside world, eth1 to the switch that connects to the
other nodes, and eth2 to the backup net). Starting yesterday eth1
stops working "out of the blue" (after running without problems
uninterrupted for 74 days). The symptoms are: ssh and rsh (tcp) stop
working, ping (icmp) stops, but ruptime (udp) still works.
"ifconfig eth1 down;ifconfig eth1 up" brings the interface up again.
This happened twice yesterday and already twice today. There is nothing
in the logfiles that indicates a problem. Furthermore, the
"ifconfig eth1 down;ifconfig eth1 up" randomly causes some of the
nodes to hang (this time ssh/rsh stop working, ruptime stops as well,
but ping still works; however I can't even login from the console so
that the only choice is to press the reset button on those nodes).
In this case the syslog shows the message
b05 kernel: nfs: server b01 not responding, timed out
just before it hangs. I am using the 3Com's 3c90x.o module from
http://support.3com.com/infodeli/tools/nic/linuxdownloading.htm
on all nodes.
Has anybody experienced similar failures?
Any suggestions what I may want to try?
(I'm kind of desperate right now).
Thanks for the help.
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert@sfu.ca
Canada V5A 1S6
========================================================================
-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-vortex-bug-request@beowulf.org