[Beowulf] Re: Odd AMD quad core SuperMicro power off issues

Chris Samuel csamuel at vpac.org
Mon Jul 6 17:53:39 PDT 2009

----- "David Mathog" <mathog at caltech.edu> wrote:

Hi David,

> Chris Samuel wrote:
> > Since I wrote that we have seen far fewer with 2.3GHz
> > Shanghai (2376, a 75W part), *but* we have some 
> some as in:  some of the upgraded nodes do this, some do not?

Some as in any of the ones we've had the chance to isolate
and run tests on (the others are running user jobs).

> Refresh our memory on this, are you seeing orderly power
> off (as in a shutdown) or are the nodes just powering
> down like "boom"?

Fall down go boom. :-(

One second running with power light on, next second dead
with no power light.

> In the latter case I would tend to suspect that the power
> supply has issues and is triggering an emergency power off
> to prevent damage from overheating or overload.

We've duplicated this with a much higher capacity PSU
on the same board.

> Swapping the CPUs could make a difference if the newer
> ones use a bit less power than the older ones.

It's the lower power (55W) parts that power off, not
the higher power (75W) ones.  I would have thought it
would have been the other way around ?

Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency

More information about the Beowulf mailing list