[Beowulf] Re: scalability

Gus Correa gus at ldeo.columbia.edu
Fri Dec 11 15:34:45 PST 2009


Hi Amjad

amjad ali wrote:
> Hi Gus,
> 
>     I was told that some people used to run two processes only on
>     dual-socket dual-core Xeon nodes , leaving the other two cores idle.
>     Although it is an apparent waste, the argument was that it paid
>     off in terms of overall efficiency.
> 
> 
> I guess I fully agree with this.
> 
> 
>     Have you tried to run your programs this way on your cluster?
>     Say, with one process only per node, and N nodes,
>     then with two processes per node, and N/2 nodes,
>     then with four processes per node, and N/4 nodes.
>     This may tell what is optimal for the hardware you have.
>     With OpenMPI you can use the "mpiexec" flags
>     "-bynode" and "-byslot" to control this behavior.
>     "man mpiexec" is your friend!  :)
> 
> 
> does mpich also provide this? 
 > or it will be controlled by the scheduler??

A mixed answer.

I think you can do this with MPICH2, but it is not so
easy as it is with OpenMPI, depending on other things,
particularly the mpiexec that you use.

1) You can use Torque/Maui and request full nodes,
as Chris Samuel suggested to you in another thread.
E.g.:
#PBS -l nodes=10:ppn=8

This doesn't guarantee or requires that your
job will run on a single core per node,
but it ensures that nobody else
will be running anything there but you.
Hence, this is kind of a preliminary step.

2) If you use mpd and the native MPICH2 mpiexec to launch programs
compiled with MPICH2, you could control where the processes run
by using the "-machinefile" or the "-configfile"
option.
To take effect, you also need to tweak with the
contents of your "machinefile"/"configfile".
For instance, you could write a script to run inside the
PBS script, but before the mpiexec command, to read the PBS_NODES
file, and build the "machinefile"/"configfile"
with selected nodes.
That is more involved than with OpenMPI, but not too hard to do.
You must read "man mpiexec" to do this right.

3) If you use the OSC mpiexec 
(http://www.osc.edu/~djohnson/mpiexec/index.php)
to launch programs compiled with MPICH2,
you can use the "-pernode" option to run in a single core per node,
which is similar to OpenMPI "-bynode", and easy to do.

4) If you still use MPICH1, which is too old, unmaintained,
and troublesome, then upgrade to OpenMPI or to MPICH2,
and use the solutions proposed here and in previous emails.

> 
> But still if it is a shared cluster (as in my case) then the cores you
> left unbusy may be allocated to another process of another user by the 
> Batch scheduler. Right??

Unless you request full nodes, as Chris Samuel suggested:
#PBS -l nodes=10:ppn=8

However, beware that this greedy and wasteful  behavior
may drive your system administrator and
the other cluster users mad at you!        :)
Well, you can always justify it in the name of science, of course. ;)

> 
> 
> 
>         Yes, CFD codes are memory bandwidth bound usually.
> 
> 
>     Indeed, and so is most of our atmosphere/ocean/climate codes,
>     which has a lot of CFD, but also radiative processes, mixing,
>     thermodynamics, etc.
>     However, most of our models use fixed grids, and I suppose
>     some of your aerodynamics may use adaptive meshes, right?
>     I guess you are doing aerodynamics, right?
> 
> 
> Amazing!!
> but I would really love to know (infact, learn) which 
> factors/indications made you to guess so correctly.
> 

Google is not only your friend.
It is also *my* friend!  :)
Is the Amjad Ali Pasha listed here yourself or somebody else?

http://www.aero.iitb.ac.in/aero/people/students/phd.html

Dialog here is a two way road, a cooperative and open exchange.
My identity is stamped on my signature block in all messages,
no secret about it.
Why not yours?

> 
> I would offer you 6 cents.
> 2 cents --- you missed below.
> 2 extra.
> 2 cents for next email.
> 
> 
>         Thank you very much.
> 

My two Rupees. :)

Best,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

> 
>     My pleasure.
> 
> 
>     I hope this helps,
> 
>     Gus Correa
>     ---------------------------------------------------------------------
>     Gustavo Correa
>     Lamont-Doherty Earth Observatory - Columbia University
>     Palisades, NY, 10964-8000 - USA
>     ---------------------------------------------------------------------
> 




More information about the Beowulf mailing list