[Beowulf] Cooling vs HW replacement

Jim Lux james.p.lux at jpl.nasa.gov
Sun Jan 16 22:16:53 PST 2005


----- Original Message -----
From: "Ariel Sabiguero" <asabigue at fing.edu.uy>
To: <beowulf at beowulf.org>
Sent: Sunday, January 09, 2005 5:09 AM
Subject: [Beowulf] Cooling vs HW replacement


> Hello all.
> The following question shall only consider costs, not uptime or
> reliability of the solution.
> I need to balance costs of hardware replacement after failures over air
> conditioning costs.
> The question arises as most current hardware comes with 3 or more years
> of warranty. During that period of time Moore twofolded twice hardware
> performance... is it worth spending money cooling down a cluster or just
> rebuilding it after it "burns out" (and is at least 4 times slower than
> state-of-the art)?
> Is it worth cooling down the room to a Class A Computer room standard or
> save the money for hardware upgrade after three years? In warm countries
> keeping 18ºC the air inside a room (PC-heated) when outside temperature
> is 30ºC average it becomes pretty expensive to pay electricity bills. It
> is cheaper to "circulate" 30ºC air and have from 40-50ºC inside the
chassis.

Fascinating system design question....

>
> Do you have figures or graphs plotting MTBF vs temperature for main
> system components (memory, CPU, mainboard, HDD) ?

Such data is very hard to come by, however, a good rule of thumb is that
life (MTBF) is halved for every 10 degree (C) temperature rise (Arrhenius
equation).   I have seen temperature vs MTBF data for disk drives, a google
or search of a site such as Seagates should find it.  Of course, they do
accelerated life testing at elevated temperatures, so there must be some
analysis that equates X hours of operation at temperature Y to Z house of
operation at temperature Y-30.  The real question would be what's the life
limiting component... I'd be willing to gamble (based on personal experience
with PC failures over the last 20 years) that it's some component in the
power supply.


> Links to this information are highly appreciated!
> I remember old (40MB RLL disks shipping this information with the
> device, several pages of  printed manual) hardware showing the
> difference in MTBF vs environment conditions, but nowadays commodity
> harware does not consider this on the sticker on the top of the device...

But it is available at mfr's websites, at least for some components.


Jim Lux




More information about the Beowulf mailing list