[Beowulf] Odd Infiniband scaling behaviour - *SOLVED* - MVAPICH2 problem
Chris Samuel
csamuel at vpac.org
Mon Oct 8 17:20:55 PDT 2007
On Mon, 8 Oct 2007, Chris Samuel wrote:
> If I then run 2 x 4 CPU jobs of the *same* problem, they all run at
> 50% CPU.
With big thanks to Mark Hahn, this problem is solved. Infiniband is
exonerated, it was the MPI stack that was the problem!
Mark suggested that this sounded like a CPU affinity problem, and he
was right.
Turns out that when you build MVAPICH2 (in our case mvapich2-0.9.8p3)
on an AMD64 or EM64T system is defaults to compiling in and enabling
CPU affinity support.
So if we take an example of 4 x 2 CPU jobs, it has the unfortunate
effect of binding all those MPI processes to the first 2 cores in the
system - hence why we see only 25% CPU utilisation per process
(watched via top, and evident by the comparative run time).
Fortunately though it does check the users environment for the
variable MV2_ENABLE_AFFINITY and if that is set to 0 then the
affinity setting is bypassed.
So simply modifying my PBS script to include:
export MV2_ENABLE_AFFINITY=0
before using mpiexec [1] to launch the jobs results in a properly
performing system again!
I'm currently running 4 x 2 CPU NAMD jobs and they're back to properly
consuming 100% CPU per process.
Phew!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071009/b80235de/attachment.sig>
More information about the Beowulf
mailing list