[Beowulf] Re: PowerEdge SC 1435: Unexplained Crashes.

John Hearns hearnsj at googlemail.com
Fri Oct 10 08:22:40 PDT 2008

2008/10/10 Rahul Nabar <rpnabar at gmail.com>

> Thanks John. I will do that. A question: how likely is it that this is
> a software issue and not hardware from my symptoms? They keep harping
> on the fact that I am running a non-validated OS.

I have had that line from a few companies.
Looks like you are talking to first or second line support people - this is
what they have been trained to say.
You can understand this - it stops idiots calling in to them who are trying
to run some spit-and-sawdust distribution they got free with a packet of
breakfast serial.
Again, just use your salesman or account manager and say that you spent $$$
with Dell to run applications,
and that the systems were sold to you as being able to run Linux.

> I have the latest. But that's only based on the version #. I will dig
> deeper. Could this be bad BIOS, though, from the symptoms? So, some
> code somewhere switches the state of that LED from blue to orange and
> if only I knew what the trigger was supposed to be. Someone had to
> write that!

The light coming on indicates a fault.
You should be interrogating the onboard management controller (called an
IPMI card ort a BMC, or in Dell speak I think a DRAC).
Log onto the suspect nodes anr run ipmitool:

ipmitool -I open sel elist

Do you not get a fault code on the little screen beside the light?
