[Beowulf] Partial OT: CPU grouping control for Windows 2008 R2 x64 server for big calcs
landman at scalableinformatics.com
Thu Jan 12 12:45:16 PST 2012
Ok, this one is fun. For some definitions of fun. Unusual definitions
of fun... And there is a question towards the end. This is for folks
who've been administrating clusters and HPC systems with big windows
machines (32+ CPUs and large RAM).
Imagine you have a machine as part of a very loose computing cluster.
End user wants to run Windows (2008R2 x64 enterprise) on it. This
machine has 32 processor cores (real ones, no hyperthreading), 1TB ram.
Yeah, its a fun machine to work on. I won't discuss the OS choice here.
You can see some of my playing with it here:
http://scalability.org/?p=3541 and http://scalability.org/?p=3515
Windows machines can let up to 64 logical processors be part of a
"group". A group is a scheduling artifice, and not necessarily directly
related to the NUMA system ... think of it as a layer abstraction above
Ok, still with me?
This scheduling artifice, these groups, require at minimum a
recompilation to work properly with. Its actually more than that, they
do require some additional processor affinity bits be handled. If you
have a code which doesn't handle this correctly, it will probably crash.
Or not work well. Or both.
Matlab appears to be such a beast. This isn't necessarily a Matlab
issue per se, it appears to be something of a design compromise issue in
Windows. Windows wasn't designed with large processor counts in mind.
The changes they'd need to make in order to enable a single large
spanning entity across all CPUs at once are quite likely not in the
companies best interests, as there are very few customers with such
Still with me? Here's the problem.
Matlab seems to crash (according to the user) if run on a unit with more
than one group. I've not been able to verify on the machine yet myself,
but I have no reason to disbelieve this. The issue as its been stated
to me is that if there is more than one group of processors, Matlab
crashes. This is the symptom.
When the unit boots by default, we have 2 16 processor groups. So
looking at bcdedit examples, I see how to turn off groups.
One minor problem.
It doesn't work.
I can do an
bcdedit /set groupaware off
reboot. Which should completely disable groups, so that all 32
processor are in one group. Still 2 groups.
I can do an
bcdedit /set groupsize 64
reboot. Still 2 groups.
So far, the only thing that seems to change this is if I install the
hyperV role. With that, there is now 1 group.
Looking at all the boot options with
there's only one config for boot, and its the default.
So ... my questions
1) Does Windows really ignore its approximate equivalent to its boot
options on a grub line?
2) Is there any way to compel Windows to do the right thing?
As noted, this is for a computing cluster. Our recommended OS isn't
feasible right now for them and their application.
Definitely annoying. I'd love there to be a bios setting to help
windows past its desire to ignore my requested number of groups. Not
sure if adding in the hyperV will impact performance (did some base
testing with Scilab to see, and I didn't see anything I'd call significant).
Will be bugging Microsoft about this as well (pretty obviously a bug in
And related to this, I read something about limits in the different
windows editions. Is anyone using Windows HPC cluster on big memory
machines with lots of cores? Looking at the Microsoft docs, they
indicate some relatively low limits on ram and processor count. So does
this mean that they won't be supporting Interlagos 4 socket machines 16
cores per socket and 1/2 TB ram in compute nodes for Windows HPC ? I am
just imagining someone buying a few of those nodes and being required to
buy Enterprise or Data center licenses for those machines (which clearly
would not be used for anything more than HPC).
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf