<html><head></head><body style="font-size:10pt;font-family:Verdana,Arial,Helvetica,sans-serif;">On Apr 13, 2010 00:24 "Lux, Jim (337C)" <a href="mailto:james.p.lux@jpl.nasa.gov"><james.p.lux@jpl.nasa.gov></a> wrote:<br><blockquote type="cite"><blockquote type="cite">-----Original Message-----<br>From: <a href="mailto:beowulf-bounces@beowulf.org">beowulf-bounces@beowulf.org</a> [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jon Tegner<br>Sent: Monday, April 12, 2010 11:02 AM<br>To: Mark Hahn<br>Cc: <a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a><br>Subject: Re: [Beowulf] 96 cores in silent and small enclosure<br><br><blockquote type="cite">well, you can look up the max operating spec for your particular chips.<br>for instance, <a href="http://products.amd.com/en-us/OpteronCPUResult.aspx">http://products.amd.com/en-us/OpteronCPUResult.aspx</a><br>shows that OS8439YDS6DGN includes chips rated 55-71. (there must be<br>some further package marking to determine which temp spec...)<br></blockquote>I find it strange with this rather large temp range, and 55 seems very<br>low to my experience. Could they possibly stand for something else? Did<br>not find any description of the numbers anywhere on that address.<br></blockquote><br><br> The document Mark posted a link to this morning explains all.<br><br> That temperature is the max case temperature given a certain power dissipation (TDP), heat sink, and ambient, and also rolls in some other assumptions (such as the thermal resistance from some junction to case)<br><br> The actual "max temp" limit you're designing to is Tctl Max, which looks like it's 70C for the most part. The problem is that "Tctl Max (maximum control temperature) is a non-physical temperature on an arbitrary scale that can be used for system thermal management policies. Refer to the BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors, order #31116"<br><br> I think a fair amount of study is needed to really understand the thermal management of these devices. In many ways, doing it for a modern processor is like doing it for a whole PC board with lots of parts. You've got different functional blocks, all running at different speeds, some enabled, some disabled, so you can't just have a single "keep the case at point X below temp Y".<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">***************************************************************************</blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks for the information! Lets see if I understand this correctly:</blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">* The temperature reported to bios is the Tctl-temperature?</blockquote><blockquote type="cite">* This "temperature" is non-physical, but the number is designed to be relevant to the cooling requirements of the CPU. That is, if this number is larger than Tctl Max, the cpu take corrective actions, e.g. throttling down?</blockquote><blockquote type="cite">* If this number (Tctl) is below Tctl Max the chances are high that the cpu will live a happy life for many years? It would be stupid of AMD to not have designed this number with some margin to account for different cooling situations.</blockquote></body></html>