[Beowulf] PowerEdge SC 1435: Unexplained Crashes.

Nifty niftyompi Mitch niftyompi at niftyegg.com
Fri Oct 17 08:22:30 PDT 2008


On Thu, Oct 09, 2008 at 02:20:52PM -0500, Rahul Nabar wrote:
> 
> I have a PowerEdge SC 1435 that has a strange problem. We bought about 23 of
> these for a cluster and machines have been failing in a somewhat random manner
> in a peculiar way:
> 
> (1) Screen is blank. Front blue indicator turns steady orange.
> 
> (2) Cannot get it to reboot by pressing (or keeping depressed) the power button
> 
> (3) only way to reboot is to cycle the power.
> 
> (4) After reboot machine works fine again , till after a few days same failure.
> 
> Ran the dset and diagnostic CD but nothing relevant.
> 
> Any tips what could be the faulty component? Or debug ideas? Right now I'm
> totally lost! Hardware / software? CPU / Motherboard / Power supply?
> 
> Anoybody knows what exactly makes the indicator LED turn steady orange from its
> normal blue state? This is not one of the 4 numbered LEDs but the one to their
> right.
> 
> I posted this problem on a PowerEdge mailing list but haven't gotten
> very far yet. Any suggestions are appreciated!
> 

Check the baseboard management controller log (Ctrl+E).

Tell us what software distribution you are running and any changes that might have
been made (no matter how small). What is the default run level (is X11 active/ not active). 
Are power saving options enabled in the BIOS?

Also what hardware monitor software are you running.  I have seen system admins add
their own package to systems only to find that RHEL has an equivalent package
that uses different device drivers for the same hardware with impossible to diagnose
results.  Custom kernel?

Disable cpuspeed, hardware monitor and hardware control software to see if stability changes.

What additional hardware is in the chassis?

The "poweredge indicator turning orange" tells me that the problem was detected by the 
system and there should be a hint in the log.   The orange state is sticky and
needs to be cleared....


-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?




More information about the Beowulf mailing list