[Beowulf] Problems with Dell M620 and CPU power throttling

Fri Aug 30 09:19:56 PDT 2013

On 08/30/2013 12:00 PM, Mark Hahn wrote:
>> Of course we have done system tuning.
>
> sorry for the unintentional condescenscion - what I actually meant was
> "tuning of knobs located in /sys" :)
>
>> Instrumenting temperature probes on individual CPUs has not been
>> performed. When we look at temperatures from both the chassis and
>> ipmitool, we see no drastic peaks.  Maybe we are getting a 60C peak
>> that we don't detect and that is the cause.  But I doubt it.
>
> could you try "modprobe coretemp", and see whether interesting things
> appear under:
> /sys/devices/system/cpu/cpu*/thermal_throttle/core_throttle_count
>
> afaik, reading the coretemp*/temp*_input values would let you do
> higher-resolution monitoring to see whether you're getting spikes.

We have these already loaded and see values:

[root at r2c3n4 thermal_throttle]# ls
core_power_limit_count  core_throttle_count  package_power_limit_count 
package_throttle_count
[root at r2c3n4 thermal_throttle]# cat *
18781048
0
18781097
0

This was what led us to how the chassis was limiting power.  We had been 
using redundancy and switched to non-redundant to try and eliminate.  We 
believe that we see these messages when the CPU is throttling up in 
power.  From the google oracle,  these messages are benign.  Perhaps 
that isn't so...

>
>> power consumption is around 80W.  That tells me that the system is
>> cool enough.  Should I not believe those values?  i have no reason to
>> from past experience.
>
> I'm not casting aspersions, just that chassis temps don't tell the whole
> story.  is your exact model of CPU actually rated for higher power?
> we've got some ProLiant SL230s Gen8 with E5-2680's - rated for 130, and
> don't seem to be throttling.

These are  E5-2670 0 @ 2.60GHz.  Two per node.

>
>> Input air is about 22C.  For our data center, you'd have a better
>> chance of getting this adjusted to 15C than I would!  As for fans,
>> these don't have
>
> yes, well, it is nice to have one's own datacenter ;)
> but seriously, I find it sometimes makes a difference to open front
> and back doors of the rack (if any), do some manual sampling of air
> flow and temperatures (wave hand around)...
>
>> For heat sink thermal grease problems, I'd expect this to be visible
>> using the ipmitools but maybe that is not where the temperatures are
>> being measured.  I don't know about that issue.  I'd expect that a bad
>> thermal grease issue would manifest itself by showing up on a per
>> socket level and not on both sockets.  It seems odd that every node
>> exhibiting this problem would have both sockets having the same issue.
>
> well, if both sockets have poor thermal contact with heatsinks...
> I'm not trying to FUD up any particular vendor(s), but mistakes do happen.
> I was imagining, for instance, that an assembly line might be set up
> with HS and thermal compound tuned for E5-2637 systems (80W/socket), but
> was pressed into service for some E5-2690 nodes (135W).

I'd expect to see the bad nodes be bad nodes consistently.  They have 
been mostly moving targets at this point, randomly distributed.

>
>> Again, the magnitude of the problem is about 5-10% at any time.  Given
>> 600
>
> if I understand you, the prevalence is only 5-10%, but the magnitude
> (effect)
> is much larger, right?

Right.  We have many jobs which use 32 nodes or more.  Anytime a node 
goes bad, the whole job starts to crawl along thus tying up resources 
for days instead of hours.

Bill

>
> regards, mark.