Mysterious kernel hangs

D. R. Holsbeck drh at niptron.com
Thu Mar 15 07:08:06 PST 2001


Felix Rauch wrote:
You might want to change the eepro module to the latest version.
We have had some of the same issues. At first we tried the Intel
version. Seemed to help, but performance wasnt so hot. We are currently 
testing the latest one from http://www.scyld.com/network/eepro100.html.
And have seen good things so far.

> 
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
> 
> The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
> controllers (7899P), onboard intel Fast Ethernet adapters (82557
> [Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
> Gigabit Ethernet cards.
> 
> We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
> be the same problem with all kernels: When we run experiments which
> use the network intensively, any of the machines will just freeze
> after a few hours. The frozen machine does not respond to anything and
> up to now we were not able to see any log-entries related to the
> freeze on virtual console 10 :-(   We switched now on all the "Kernel
> Hacking" stuff in the kernel configuration (especially the logging)
> and we will try again, hopefuly we will at least see some log outputs.
> 
> The freezes do also happen if we let non-network-intensive jobs run on
> the machines (e.g. SETI at home), but clearly they happen less often.
> 
> Does anyone of you have any ideas what could go wrong or what we could
> try to find the cause of the problems?
> 
> Regards,
> Felix
> --
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
drh at niptron.com

"Necessity is the mother of taking chances."
--Mark Twain.





More information about the Beowulf mailing list