[Beowulf] Resolved - Approach For Diagnosing Heat Related Failure?

Jon Forrest jlforrest at berkeley.edu
Tue Jul 28 16:54:56 PDT 2009

It turned out that the cause of my heat related
failure that I posted about a couple of weeks
ago was indeed bad memory.

I did try all the suggestions about making sure
the fans and the heat sinks were working properly.

The BIOS showed that all temperatures were well
within the proper range.

This was a strange failure in that memtest
was of no use because it itself crashed without
showing any errors. The fact that memtest couldn't
run wasn't in itself a sign that the problem
was due to memory since there could be many
reasons why this happens.

To the commenter who mentioned the fact that my office is
cooler than our computer room - this is a sad fact
about the financial state of the Univ. of Calif. these days.

Anyway, thanks for all the suggestions.

Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
jlforrest at berkeley.edu

More information about the Beowulf mailing list