[Beowulf] Odd SuperMicro power off issues
csamuel at vpac.org
Mon Apr 6 05:53:21 PDT 2009
Back in December...
----- "Chris Samuel" <csamuel at vpac.org> wrote:
> Hi folks,
> We've been tearing our hair out over this for a little
> while and so I'm wondering if anyone else has seen anything
> like this before, or has any thoughts about what could be
> happening ?
> Very occasionally we find one of our Barcelona nodes with
> a SuperMicro H8DM8-2 motherboard powered off. IPMI reports
> it as powered down too.
Well we've been gradually replacing the Barcelona chips
with Shanghai (same clockspeed) and we are yet to see a
power off on a Shanghai node!
We've got over half the machine changed over so far,
and some of them have been in for a fair while, so it
does seem to be statistically significant.
We know it's not the BIOS update as we've done some
in advance of getting Shanghai's in and they've still
powered off with Barcelona.
We do still see the occasional logless hang on the
Shanghai nodes (as we did with Barcelona) but I
suspect that's going to be a different battle.
All the best,
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the Beowulf