Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Approach For Diagnosing Heat Related Failure?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Dmitry Zaletnev dzaletnev at yandex.ru
Tue Jul 21 16:04:28 PDT 2009


Jon,

> I have a rack full of identical compute
> nodes. One of them has become heat sensitive.
> 
> When it's in the warm computer room it crashes.
> I can't even run memtest from the CentOS DVD
> for 2 seconds. However, when this node is
> in my much cooler office everything works
> fine. All the other nodes are working fine
> in the computer room.
I'd such a problem when the plastic clip wich 
mount the base ring of CPU cooler was broken
and CPU cooler was mounted by the rest 3 clips.
When I started to save Virtual Machine compiling
OpenFOAM from sources, Ubuntu made shutdown on
overheat.
> 
> I'm not convinced the problem is actually
> the memory. Other than opening the node
> to spray cooling liquid when it's in the warm
> room, what approach would you use to figure out which
> component(s) is(are) failing?
> 
> Cordially,
> -- 
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlforrest at berkeley.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  

Sincerely,
Dmitry

Яндекс.Почта. Поищите спам где-нибудь еще http://mail.yandex.ru/nospam/sign



More information about the Beowulf mailing list