Maximum room temperature
Robert G. Brown
rgb at phy.duke.edu
Mon Apr 22 11:58:44 PDT 2002
On Mon, 22 Apr 2002, Manel Soria wrote:
> I'm wondering what is the maximum reasonable ambient
> temperature to have in a cluster room. In our room
> with 72 nodes we have about 29-30 oC (84-86 oF).
> Is this too high ? Can this be the cause of hardware
> failures ?
Yes, it can. This is pretty high for a server room.
The best way to think of temperature and heat disposal in a cluster is
to think in layers. Heat generally flows from hot to cold, at a rate
proportional to the difference in temperture in degrees Kelvin. More
specifically, the rate of flow is influenced by things like
conductivities, convective flow, and radiative trapping.
The CPU core generates heat at some roughly constant rate under load.
Current/modern CPU's "can" operate at very high temperatures, order of
100C, although they will almost certainly operate more reliably and
longer at considerably cooler core temperatures.
This heat generally flows from the CPU into the attached heat sink/fan
at a rate determined by the temperature DIFFERENCE between the heatsink
and the CPU. If the conductivity of the heatsink is high, and the
conductivity of the interface is also high, a small temperature
difference will cause a lot of heat to flow from the hotter to the
cooler. The CPU is thus cooled until it isn't too much warmer than the
operating temperature of the heatsink.
The heatsink then has to be cooled so that IT is cooler than the desired
operating temperature of the CPU. The hotter it is, the faster it loses
heat to the ambient air. The cooler the ambient air, the faster it
loses heat. Here things get a bit arcane. Air is not all that great a
conductor of heat. It does have some heat capacity and will warm up
when in contact with a warmer surface. Heat sinks therefore generally
have lots of surface area and fans in the case and heatsink itself move
(hopefully cooler) air rapidly across this surface. All things being
equal, though, when the CPU produces heat at a constant rate the
heatsink/fan/air arrangement can remove heat at that rate only when the
air and the heatsink have a given, approximately constant, temperature
difference.
This warmed air has to then be removed from the case and replaced with
cooler ambient air from the server room, and the warmed air eventually
has to be circulated over actively cooled (refrigerated) coils to remove
it from the room altogether and eventually dump it, plus all the energy
required to do the cooling, into the outside air.
The cooler the room air, the cooler all the components inside your
system, especially the CPU. Cooling down the room air temperature 10C
should reduce the operating temperature of your CPU by very close to
10C.
Most systems are probably engineered with the assumption that they will
operate in air in the 68-75F temperature range (20-23C), and can
probably tolerate ambient air up to 80F or 26C without much risk. If
the ambient temperatures get much higher than this, though, your risk of
catastrophic heat-induced failure starts creeping up. At around
100F/38C they become very high indeed -- close to "certain" if you try
operating a system 24 hours under a high load at or above this ambient
air temperature. If a system is ever operated for an extended period
over 30C (in the 90s F) it may not fail, but even if you cool it back
down you may have marginally damaged components that will fail later.
An additional risk for even fairly short periods of high temperature
operation is that hard disks are made of metal that expands when heated.
If a disk expands too much, the write head can actually become
misaligned with the tracks and your disk can be instantly and
irrecoverably trashed. This can also happen if the disk is COOLED too
much -- it is a bad idea to crank up a laptop after it has sat all night
in a sub-zero car without letting it come to a "normal" operating
temperature first...
If I were you I'd engineer enough cooling to drop the ambient air in
your cluster space by at least 5C, if not 10C, and make sure that there
is enough air circulation and mixing that no systems are in local "hot
spots" (where air exhausted from one system is sucked into another
system, for example). A really happy server room is one you need to wear
a jacket or sweater in to be comfortable, not one that makes you want to
take clothes off...;-)
rgb
>
> Thanks.
>
> --
> ===============================================
> Dr. Manel Soria
> ETSEIT - Centre Tecnologic de Transferencia de Calor
> C/ Colom 11 08222 Terrassa (Barcelona) SPAIN
> Tf: +34 93 739 8287 ; Fax: +34 93 739 8101
> E-Mail: manel at labtie.mmt.upc.es
>
>
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list