[Beowulf] Re: Performance degrading

Gus Correa gus at ldeo.columbia.edu
Tue Dec 22 16:35:49 PST 2009


Hi Jorg

I agree your old OpenMPI  1.2.8 should not be the problem,
and upgrading now will only add confusion.
I only suggested running simple test programs (cpi.c, connectivity_c.c)
to make sure all works right, including your network setup.
However, you or somebody else may already have done this in the past.

Hybrid communication schemes have to be handled with care.
We have plenty of MPI+OpenMP programs here.
Normally we request one processor per OpenMP thread for each
MPI task. For instance, if each MPI task opens four threads,
and you want three MPI tasks, then a total of 12 processors,
is requested from the batch system (e.g. #PBS -l nodes=3:ppn=4).
However, only 3 MPI processes are launched in mpiexec
(mpiexec -bynode -np 3 executable_name).
The "-bynode" option will put one MPI task on each of three nodes,
and each of these three MPI tasks will launch 4 OpenMP threads,
using the 4 local processors.
However, GA is not the same as OpenMP,
and another scheme may apply.

I don't know about GA.
I looked up their web page (at PNL),
but I didn't find a direct answer to your problem.
I would have to read more to learn about GA.
See if this GA support page may shed some light on how you
configured it (or how it is configured inside NWChem):
http://www.emsl.pnl.gov/docs/global/support.html
See their comments about the SHMMAX (shared memory segment size)
in Linux Kernels, as this may perhaps be the problem.
Here, on 32-bit Linux machines I have a number
smaller than they recommend (134217728 bytes, 128MB):

cat /proc/sys/kernel/shmmax
33554432

But on 64-bit machines it is much larger:

cat /proc/sys/kernel/shmmax
68719476736

You may check this out on your nodes, and if you have a low number
(say in a 32-bit node), perhaps try their suggestion,
and ask the system administrator to change this kernel parameter
on the nodes by doing:

echo "134217728" >/proc/sys/kernel/shmmax

GA seems to be a heavy user of shared memory,
hence it is likely to require more shared memory
resources than normal programs do.
Therefore, there is a flimsy chance that increasing SHMMAX may help.

I also found the NWChem web site (also at PNL).
You may know well all about this, so forgive me any silly suggestions.
I am not a Chemist, computational or otherwise.
I still need to understand PH and Hydrogen bridges right.
The NWChem User Guide, Appendix D (around page 401, big guide!)
has suggestions on how to run in different machines,
including Linux clusters with MPI (section D.3).
http://www.emsl.pnl.gov/capabilities/computing/nwchem/docs/usermanual.pdf

They have also FAQ about Linux clusters:
http://www.emsl.pnl.gov/capabilities/computing/nwchem/support/faq.jsp#Linux

They also have a "known bugs" list:
http://www.emsl.pnl.gov/root/capabilities/computing/nwchem/support/knownbugs/
Somehow they seem to talk only about MPICH
(not sure if MPICH1 or MPICH2),
but not about OpenMPI.
Likewise, browsing through the GA stuff I could find no direct reference
to OpenMPI, only to MPICH,
although theoretically MPI is supposed to be a standard,
and portable programs ans libraries should work right with any MPI 
flavor (in practice I am not so sure this is true).

Also, GA seems to have specific instructions for Infiniband
(but not for Ethernet), see the web link on GA above.
What do you have IB or Ethernet?
If you have both you can select one (say, IB)
with the OpenMPI mca parameters
in the mpiexec command line.

I know it won't help ... but I tried ...  :(

Good luck, and Happy Holidays!

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Jörg Saßmannshausen wrote:
> Hi all,
> 
> right, following the various suggestion and the idea about oversubscribing the 
> node, I have started the same calculation again on 3 nodes but this time only 
> with 3 processes on the node (started), so that will leave room for the 4th 
> process which I believe will be started by NWChem.
> Has anything changed? No.
> 
> top - 10:24:06 up 21 days, 17:29,  1 user,  load average: 0.25, 0.26, 0.26
> Tasks: 131 total,   1 running, 130 sleeping,   0 stopped,   0 zombie
> Cpu0  :  4.3% us,  1.0% sy,  0.0% ni, 93.7% id,  0.0% wa,  0.0% hi,  1.0% si
> Cpu1  :  2.7% us,  0.0% sy,  0.0% ni, 97.3% id,  0.0% wa,  0.0% hi,  0.0% si
> Cpu2  :  0.7% us,  0.0% sy,  0.0% ni, 99.3% id,  0.0% wa,  0.0% hi,  0.0% si
> Cpu3  :  0.0% us,  0.0% sy,  0.0% ni, 99.0% id,  1.0% wa,  0.0% hi,  0.0% si
> Mem:  12308356k total,  5251744k used,  7056612k free,   377052k buffers
> Swap: 24619604k total,        0k used, 24619604k free,  3647568k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 29017 sassy     15   0 3921m 1.7g 1.4g S    3 14.8 384:50.34 nwchem
> 29019 sassy     15   0 3920m 1.8g 1.5g S    2 15.4 379:26.55 nwchem
> 29018 sassy     15   0 3920m 1.8g 1.5g S    1 15.5 380:31.15 nwchem
> 29021 sassy     15   0 2943m 1.7g 1.7g S    1 14.9  42:33.81 nwchem   <- 
> process started by NWChem I suppose
> 
> As Reuti pointed out to me, NWChem is using Global Arrays internally and only 
> MPI for communication. I don't think the problem is the OpenMPI I have. I 
> could upgrade to the latest, but that means I have to re-link all the 
> programs I am using.
> 
> Could the problem be the GA?
> 
> All the best
> 
> Jörg
> 
> 




More information about the Beowulf mailing list