[Beowulf] 96 cores in silent and small enclosure

Mark Hahn hahn at mcmaster.ca
Sun Apr 11 19:59:04 PDT 2010

> Have done some preliminary tests on the system. Indicates a CPU temperature 
> of 60-65 C after half an hour (will do longer test soon). Have a few

that's pretty hot.  some servers will shutdown at 65 (our DL145g2's,
for instance).  of course, the metric is poorly defined: is that a 
thermister under the CPU, or a sensor on the die itself?

> * How high cpu temperatures are acceptable (our cluster is built on 6 core 
> AMD opterons)?

well, you can look up the max operating spec for your particular chips.
for instance, http://products.amd.com/en-us/OpteronCPUResult.aspx
shows that OS8439YDS6DGN includes chips rated 55-71.  (there must be some 
further package marking to determine which temp spec...)

> I know life span is reduced if temperature is high, but due to 
> performance reasons life span of a CPU is pretty short anyway.

if you operate the chip within spec, you should expect the lifespan
to be plenty long (basically indefinite, but let's say 10 years...)

> * I used lm-sensors to check the temp, how accurate is that?

it's just reporting registers; that is not to say that lm-sensors is 
necessarily interpreting them correctly.  otoh, lm_sensors appears to 
be willing to offer some metadata, as well (critical temp settings.)

> * Would there be a market potential for a system like this? I naturally tend

the more specialized the product, the smaller the market.  there are lots 
of mainstream workstations which are fairly quiet.  I've even seen some 
small deskside clusters that claimed to be quiet.  personally I don't 
think it makes much sense - I'd rather use an arbitrarily-noisy cluster
from a quiet and wimpy desktop.

More information about the Beowulf mailing list