[Beowulf] Problems with Dell M620 and CPU power throttling

Christopher Samuel samuel at unimelb.edu.au
Sun Sep 1 20:49:09 PDT 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30/08/13 23:03, Bill Wichser wrote:

> Since January, when we installed an M620 Sandybridge cluster from
> Dell, we have had issues with power and performance to compute
> nodes.  Dell apparently continues to look into the problem but the
> usual responses have provided no solution.  Firmware, BIOS, OS
> updates all are fruitless.

One question, have you seen either the kernel or the BMC reporting
thermal throttling?  For instance dmesg should show you something like:

CPU0: Core temperature above threshold, cpu clock throttled (total
events = 545939)
CPU0: Core temperature/speed normal

If you're not then there is one other possibility that you may like to
test, which is tell the kernel to not automatically turn on all
powersaving modes as that introduces a heap of latency (and
potentially other issues).

We pass through:

intel_idle.max_cstate=0 processor.max_cstate=1

on our SandyBridge nodes for just that reason.

If you don't then the kernel will say "Oh, this is an Intel CPU, I
know this!" (to paraphrase Jurassic Park) and enable every power
saving feature it can find, regardless of what your BIOS/UEFI is set to.

Best of luck!
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIkCrUACgkQO2KABBYQAh+G8wCfYEoGbbufa/xdqCOLQNOxpmmp
a9MAnRLa4lp0ZqId4XgZylP1fx9M9Fcc
=l1C9
-----END PGP SIGNATURE-----



More information about the Beowulf mailing list