[Beowulf] Tips for diagnosing intermittent problems on a small cluster

David Mathog mathog at caltech.edu
Mon Nov 26 09:04:34 PST 2007


amacater at galactic.demon.co.uk (Andrew M.A. Cater) wrote


> 
> There _may_ be some PSU involvement with ours: the machine and fans are 
> running but not accepting connections. You have to disconnect the power
> for a few minutes for it to even boot again properly. Powercycling from 
> the front panel doesn't always work

The Tyan S2466N was (in our case, is) notorious for this.  This
mobo had several design flaws and this was one of the more annoying
ones.  Hitting the reset switch should be sufficient to restart a
computer in all instances, but for this board it was often
necessary to pull the plug, wait a while, and plug it back
in again, to clear whatever nonsense state the motherboard was in.
This was not a power supply issue, the reset switch goes to the
motherboard, not the power supply.  If the motherboard doesn't
handle a reset properly it is not the power supply's fault.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list