[Beowulf] Problems with Dell M620 and CPU power throttling

Mark Hahn hahn at mcmaster.ca
Fri Aug 30 07:44:30 PDT 2013

> We run the RH 6.x release and are up to date with kernel/OS patches.

have you done any /sys tuning?

> non-redundant.  tuned is set for performance.  Turbo mode is

what knobs does tuned fiddle with?  I would probably turn off all 
auto-tuning and go strictly manual until the issue is understood.

> on/hyperthreading is off/performance mode is set in BIOS.

I found a "x86_energy_perf_policy.c" (author Len Brown) which 
I run on my laptop to set powersave mode.  it sets
MSR_IA32_ENERGY_PERF_BIAS. it says that the hardware default is performance,
but I wouldn't be surprised if "normal" is set by bios.

> A reboot does not change this problem.  But a power cycle returns the
> compute node to normal again.  Again, we do not know what triggers this

unfortunately, a power cycle will also cool down the system,
so I don't see how it can be dissociated from heating.

> event.  We are not overheating the nodes.

how do you know?

> But while applications are
> running, something triggers an event where this power capping takes effect.

it might be interesting to examine the cpu-heatsink contact.  what you're
describing could be explained by poor thermal HS contact (or poor HS flow).

or do you mean that you sample die temps at high resolution and know that 
you're never hitting, say, 60C?

in some machines (albeit less often servers), acpi provides some knobs
which are made visible under /sys and seem to permit some control of 
thermal mode (fan threshold or scaling).

> If anyone has a clue, or better yet, solved the issue, we'd love to hear
> the solution!

what's your intake air temp?  I would try giving it cold (say, 15C) air.

