[Beowulf] 96 cores in silent and small enclosure

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Mon Apr 12 15:24:34 PDT 2010

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jon Tegner
> Sent: Monday, April 12, 2010 11:02 AM
> To: Mark Hahn
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] 96 cores in silent and small enclosure
> > well, you can look up the max operating spec for your particular chips.
> > for instance, http://products.amd.com/en-us/OpteronCPUResult.aspx
> > shows that OS8439YDS6DGN includes chips rated 55-71.  (there must be
> > some further package marking to determine which temp spec...)
> >
> I find it strange with this rather large temp range, and 55 seems very
> low to my experience. Could they possibly stand for something else? Did
> not find any description of the numbers anywhere on that address.

The document Mark posted a link to this morning explains all.

That temperature is the max case temperature given a certain power dissipation (TDP), heat sink, and ambient, and also rolls in some other assumptions (such as the thermal resistance from some junction to case)

The actual "max temp" limit you're designing to is Tctl Max, which looks like it's 70C for the most part.  The problem is that
"Tctl Max (maximum control temperature) is a non-physical temperature on an arbitrary scale that
can be used for system thermal management policies. Refer to the BIOS and Kernel Developer's
Guide (BKDG) For AMD Family 10h Processors, order #31116"

I think a fair amount of study is needed to really understand the thermal management of these devices.  In many ways, doing it for a modern processor is like doing it for a whole PC board with lots of parts.  You've got different functional blocks, all running at different speeds, some enabled, some disabled, so you can't just have a single "keep the case at point X below temp Y".

More information about the Beowulf mailing list