[Beowulf] Odd Infiniband scaling behaviour - *SOLVED* - MVAPICH2 problem

Chris Samuel csamuel at vpac.org
Mon Oct 8 17:20:55 PDT 2007


On Mon, 8 Oct 2007, Chris Samuel wrote:

> If I then run 2 x 4 CPU jobs of the *same* problem, they all run at
> 50% CPU.

With big thanks to Mark Hahn, this problem is solved.   Infiniband is 
exonerated, it was the MPI stack that was the problem!

Mark suggested that this sounded like a CPU affinity problem, and he 
was right.

Turns out that when you build MVAPICH2 (in our case mvapich2-0.9.8p3) 
on an AMD64 or EM64T system is defaults to compiling in and enabling 
CPU affinity support.

So if we take an example of 4 x 2 CPU jobs, it has the unfortunate 
effect of binding all those MPI processes to the first 2 cores in the 
system - hence why we see only 25% CPU utilisation per process 
(watched via top, and evident by the comparative run time).

Fortunately though it does check the users environment for the 
variable MV2_ENABLE_AFFINITY and if that is set to 0 then the 
affinity setting is bypassed.

So simply modifying my PBS script to include:

export MV2_ENABLE_AFFINITY=0

before using mpiexec [1] to launch the jobs results in a properly 
performing system again!

I'm currently running 4 x 2 CPU NAMD jobs and they're back to properly 
consuming 100% CPU per process.

Phew!

Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071009/b80235de/attachment.sig>


More information about the Beowulf mailing list