[Beowulf] 96 cores in silent and small enclosure

Mark Hahn hahn at mcmaster.ca
Tue Apr 13 22:14:58 PDT 2010

>> the original question was about wheter 60-65C is a safe operating
>> temperature.  I think it's pretty clearly high - whether it's critical
>> depends on how it's measured, the specific chip's specs, etc.
>> but it's not the sort of operating range I'd be aiming for.
> But there should be possible to save money by running hotter. Suppose you

sure: move more air and/or provide a lower thermal-resistance heatsink.

> could accept 10 degrees higher temp, then you would not have to run the AC in 
> the room as hard (and AC represents a significant part of the operating

the max temp spec is not some arbitrary knob that the chip vendors
choose out of spiteful anti-green-ness.  I wouldn't be surprised to see some
upward change in coming years, but issues here are nontrivial.  do you still
want the chip to operate correctly at 20C as well as 90C?  we're talking 
fairly big deals like lower doping or non-silicon materials.

> cost). If the price you pay is that your CPUS will only last for 4 years (I'm 
> just speculating here, and for the moment only consider the cpu) instead of 
> 10 years it would probably be an economically much better option.

the problem is that failure rates are pretty nonlinear.  my guess is that 
undercooling (or overvolt/clocking) will increase your early failure 
rate as well as putting you in a pretty steep zone by year three.
I expect a low failure rate for a server well past 5 years (cold room,
server-level cooling, 100% duty cycle.)  but the fact that chip vendors 
do 3-year warranties makes me think that going to 4 years would cost 
them significantly more...

More information about the Beowulf mailing list