[Beowulf] Approach For Diagnosing Heat Related Failure?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jon Forrest jlforrest at berkeley.eduTue Jul 21 11:56:23 PDT 2009
- Previous message: [Beowulf] storage server hardware considerations
- Next message: [Beowulf] Approach For Diagnosing Heat Related Failure?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have a rack full of identical compute nodes. One of them has become heat sensitive. When it's in the warm computer room it crashes. I can't even run memtest from the CentOS DVD for 2 seconds. However, when this node is in my much cooler office everything works fine. All the other nodes are working fine in the computer room. I'm not convinced the problem is actually the memory. Other than opening the node to spray cooling liquid when it's in the warm room, what approach would you use to figure out which component(s) is(are) failing? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu
- Previous message: [Beowulf] storage server hardware considerations
- Next message: [Beowulf] Approach For Diagnosing Heat Related Failure?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
