[Beowulf] Problems with Dell M620 and CPU power throttling

Mark Hahn hahn at mcmaster.ca
Fri Aug 30 10:48:03 PDT 2013

> [root at r2c3n4 thermal_throttle]# ls
> core_power_limit_count  core_throttle_count  package_power_limit_count 
> package_throttle_count
> [root at r2c3n4 thermal_throttle]# cat *
> 18781048
> 0
> 18781097
> 0
> This was what led us to how the chassis was limiting power.  We had been

I don't mean to be pedantic, but to me, this is the cpu throttling itself,
based on its own temperature readings and power rating.  the coretemp 
module, from its modinfo, seems to be purely on-chip.

/sys/bus/platform/devices/coretemp.0 probably contains some other 
stuff which might be interesting - for instance, what your *_max
values are.

> using redundancy and switched to non-redundant to try and eliminate.  We 
> believe that we see these messages when the CPU is throttling up in power.

I read the *_limit_count as meaning "18781048 times the core was 
down-clocked because it exceeded power limits."  ie, not "throttling up",
though I suppose these things are almost symmetric...

> These are  E5-2670 0 @ 2.60GHz.  Two per node.

so spec is 115 W and Tcase max 80C.  that's not as low a threshold
as some chips (67C seems pretty low, for instance).

regards, mark.

