Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

custom hardware (was: Xbox clusters?)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

J Harrop jharrop at shaw.ca
Thu Nov 29 11:20:58 PST 2001


We have had similar problems over the years, some of which we tracked down 
to poor grounding conditions in the building wiring.  I know one location 
where the weather (in particular rain) can affect the behavior of some of 
the system.  I expect the grounding problem would create problems with 
similar symptoms on the newer power supplies - but I cant give a detailed 
explanation such as the excellent one posted.  I seem to recall that we 
also had this problem with the older power supplies.  Solution was the same 
- unplug, wait, reboot.

My favorite hardware problem was when I was working down in Honduras.  One 
of the laptops became more and more flaky and finally quit booting at 
all.  When I swapped out the CD-ROM module to try and boot from a floppy I 
found a stray ant sitting on the inside edge of the connector!  On further 
inspection the inside of the laptop turned out to be packed with them.  I 
wanted to duct-tape the machine closed and mail the box back to Dell with a 
"bug report" taped on it ;-)

John Harrop

At 10:02 AM 29/11/2001 -0500, you wrote:
>On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote:
> >
> > David Vos wrote:
> > >
> > ....
> > > There is one computer in our cluster that would make me think twice 
> before
> > > doing a custom build.  I prefer to call it the node from heck.  It only
> > > has one problem: it won't boot.  If you press the power button, the
> > > powerlight flashes while the cpu and case fans turn a quarter turn, then
> > > nothing.  You have to wait a minute before you even get that reaction
> > > again.  (Sounds like a short somewhere).  The problem only surfaces 
> if the
> > > computer has been off for a little while, and nearly every time at that.
> >
> > I have seen similar strange behavior of some boxes in a set of 66's, 
> and the
> > way to restart is also rather odd.
> > Basically, and this has been repeatedly observed on several boxes of 
> the same
> > composition (dual Pentium III with ASUS P2BD motherboard) aligned on a 
> metallic
> > shelf, the ATX box would stop after months of activity, and the 
> simplest found
> > way to restart it is to unplug everything (power and ethernet), touch 
> it for
> > a few seconds with hands, replug and voila.  No need to open the box!
> > My guess is that some condensator needs to be unloaded, but exactly why
> > one needs to unplug every cable appears curious.
>
>One thing to understand is that, unless there is a physical
>switch on the power supply itself, ATX systems are never
>*really* turned off as long as they are plugged in -- they
>only go to a "standby" state, wherein +5V power is still
>being applied to a single pin (the purple wire). When you
>press the power button on the front of the chassis, it
>merely shorts a header that ultimately causes the
>motherboard to short the green wire in the ATX cable to
>ground -- this is a signal to the power supply to leave
>standby and start generating power for all the other
>outputs.
>
>Another thing to observe is that generally, ATX power
>supplies are switching supplies, which means that (to
>simplify things somewhat) they generate the correct voltage
>by charging and discharging a capacitor at a high rate. The
>switching controller constantly monitors the voltage on the
>capacitor and connects or disconnects the capacitor to the
>incoming supply, depending on whether the charge is above or
>below the desired level (the detailed truth behind this is
>fairly complex and typically involves multiple stages and
>inductors as well as capacitors, but this model is probably
>good enough for this discussion...). Thus, even when an ATX
>system is "off", the power supply is chugging along, keeping
>a capacitor charged to provide +5V at a low current. BTW, if
>you have the resources to do this, put a current sensor on
>the incoming AC line for a running system and feed the
>output to an oscilloscope.  You should see a series of
>alternating positive and negative spikes -- those are the
>capacitors charging at the peaks and troughs of the AC
>voltage.
>
>Now, if the ATX board were simply to run the green-wire
>contact straight through to the power on/off header, you
>wouldn't need much oomph at all on the +5V standby line, and
>older ATX power supplies in fact didn't. However, newer
>boards have things like Wake-on-LAN, Wake-on-Modem, and
>other various and sundry goodies that have to run off the
>+5V standby.  It has gotten to the point that, in order to
>do all the processing that is required to leave standby, the
>standby current draw is greater than what some older
>supplies can provide. So in the case of a power supply that
>either by design or fault cannot provide sufficient current
>under standby, what (I think) happens is that while the
>motherboard is waiting for the main supply voltages to come
>up to full power, the standby processing bleeds off the
>capacitor to the point that the standby voltage sags below
>the minimum required for operation. At that point, the
>standby processing halts, the motherboard stops holding the
>green wire to ground, and the power supply stops trying to
>power up. It then returns to standby mode, re-charges the
>standby capacitor, and the cycle begins again.
>
>If you have a system that is behaving like this, try putting
>a voltmeter on the standby pin of the ATX header (you can
>usually jab a probe down into the back of the connector).
>You should see it at +5V when the system is "off". Then
>press the system's "on" button and watch the voltage. You'll
>most likely see it sag down to a couple of volts or so.  If
>this doesn't happen, you've probably got some other problem,
>perhaps a POST failure of some sort. Also, this may not be
>the end of the diagnosis -- it is possible that the failure
>to provide enough current on standby may not be the fault of
>the power supply itself. It could be a faulty componant
>(e.g. the SCSI drive we heard about) sucking down too much
>current on power-up, or an overburdened AC supply circuit
>that sags just a bit when your system starts up -- in the
>latter case I imagine that you could wind up with a
>seemingly jinxed spot in the equipment rack. :-)
>
>BTW, if the power supply has too little oomph on standby by
>*design*, the system will probably *never* power up.  If the
>supply's design meets the new spec only marginally, or if it
>is malfunctioning, say, because of a damaged or weakened
>capacitor, then it might behave differently when cold than
>it does when it is fully warmed up. In this event,
>unplugging the supply for a while and reconnecting it can
>create a short window in which the supply can get the system
>over the hump to leave standby. I in fact have a supply at
>home that has this problem, and I just sort of live with it
>because it's not my main system. Someday perhaps I'll
>replace the supply.
>
>As to why you have to disconnect the Ethernet as well, I
>really don't have a clue.
>
>HTH,
>--Bob Drzyzgula
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list