[Beowulf] 96 cores in silent and small enclosure

Mark Hahn hahn at mcmaster.ca
Tue Apr 13 10:37:22 PDT 2010

> I find it strange with this rather large temp range, and 55 seems very low to 
> my experience. Could they possibly stand for something else? Did not find any 
> description of the numbers anywhere on that address.

I think you should always worry about any temperature measured 
on a system that's in the >= 65C range.  as Jim mentioned, the temps
that matter are actually on-chip and not really accessible - 
and it's unknown to us what they should be anyway, or how long 
they can tolerate particular temps.  and whether over-temp failure
modes would be transient (conductivity in semiconductors changes 
rapidly as a function of temperature) or gradual (electromigration
or perhaps the solder-ball problems nvidia had)...

the original question was about wheter 60-65C is a safe operating
temperature.  I think it's pretty clearly high - whether it's critical
depends on how it's measured, the specific chip's specs, etc.
but it's not the sort of operating range I'd be aiming for.

More information about the Beowulf mailing list