[Beowulf] 96 cores in silent and small enclosure
tegner at renget.se
Mon Apr 12 23:18:16 PDT 2010
On Apr 13, 2010 00:24 "Lux, Jim (337C)" <james.p.lux at jpl.nasa.gov>
> > -----Original Message-----
> > From: <beowulf-bounces at beowulf.org>
> > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jon Tegner
> > Sent: Monday, April 12, 2010 11:02 AM
> > To: Mark Hahn
> > Cc: <beowulf at beowulf.org>
> > Subject: Re: [Beowulf] 96 cores in silent and small enclosure
> > > well, you can look up the max operating spec for your particular
> > > chips.
> > > for instance,
> > > <http://products.amd.com/en-us/OpteronCPUResult.aspx>
> > > shows that OS8439YDS6DGN includes chips rated 55-71. (there must
> > > be
> > > some further package marking to determine which temp spec...)
> > >
> > I find it strange with this rather large temp range, and 55 seems
> > very
> > low to my experience. Could they possibly stand for something else?
> > Did
> > not find any description of the numbers anywhere on that address.
> The document Mark posted a link to this morning explains all.
> That temperature is the max case temperature given a certain power
> dissipation (TDP), heat sink, and ambient, and also rolls in some
> other assumptions (such as the thermal resistance from some junction
> to case)
> The actual "max temp" limit you're designing to is Tctl Max, which
> looks like it's 70C for the most part. The problem is that "Tctl Max
> (maximum control temperature) is a non-physical temperature on an
> arbitrary scale that can be used for system thermal management
> policies. Refer to the BIOS and Kernel Developer's Guide (BKDG) For
> AMD Family 10h Processors, order #31116"
> I think a fair amount of study is needed to really understand the
> thermal management of these devices. In many ways, doing it for a
> modern processor is like doing it for a whole PC board with lots of
> parts. You've got different functional blocks, all running at
> different speeds, some enabled, some disabled, so you can't just have
> a single "keep the case at point X below temp Y".
> Thanks for the information! Lets see if I understand this correctly:
> * The temperature reported to bios is the Tctl-temperature?
> * This "temperature" is non-physical, but the number is designed to be
> relevant to the cooling requirements of the CPU. That is, if this
> number is larger than Tctl Max, the cpu take corrective actions, e.g.
> throttling down?
> * If this number (Tctl) is below Tctl Max the chances are high that
> the cpu will live a happy life for many years? It would be stupid of
> AMD to not have designed this number with some margin to account for
> different cooling situations.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf