[Beowulf] PowerEdge SC 1435: Unexplained Crashes.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Rob Lines rlinesseagate at gmail.comThu Oct 9 13:18:09 PDT 2008
- Previous message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Next message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Oct 9, 2008 at 3:20 PM, Rahul Nabar <rpnabar at gmail.com> wrote: > I have a PowerEdge SC 1435 that has a strange problem. We bought about 23 of > these for a cluster and machines have been failing in a somewhat random manner > in a peculiar way: > > (1) Screen is blank. Front blue indicator turns steady orange. > > (2) Cannot get it to reboot by pressing (or keeping depressed) the power button > > (3) only way to reboot is to cycle the power. > > (4) After reboot machine works fine again , till after a few days same failure. > > Ran the dset and diagnostic CD but nothing relevant. > > Any tips what could be the faulty component? Or debug ideas? Right now I'm > totally lost! Hardware / software? CPU / Motherboard / Power supply? > Have you checked in the baseboard management log to see if it is throwing an error. Also check on the temperature of the machines. We have had some pretty wierd issues with ram and CPU quirkyness when they reach a high internal temperature. If you can do some poling using ipmi on the nodes to record the current temp and fan data over time so that you could see what it was at just before a crash you might be able to point it to an environmental situation. Hope this helps, Rob
- Previous message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Next message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
