[Beowulf] Re: Performance degrading

Jörg Saßmannshausen jorg.sassmannshausen at strath.ac.uk
Wed Dec 16 01:41:39 PST 2009


Hi guys,

ok, some more information. 
I am using OpenMPI-1.2.8 and I only start 4 processes per node. So my hostfile 
looks like that:
comp12 slots=4
comp18 slots=4
comp08 slots=4

And yes, one process is the idle one which does things in the background. I 
have observed similar degradions before with a different program (GAMESS) 
where in the end, running a job on one node was _faster_ then running it on 
more than one nodes. Clearly, there is a problem here.

Interesting to note that the fith process is consuming memory as well, I did 
not see that at the time when I posted it. That is somehow odd as well, as a 
different calculation (same program) does not show that behaviour. I assume 
it is one extra process per job-group which will act as a master or shepherd 
for the slave processes. I know that GAMESS (which does not use MPI but ddi) 
has one additional process as data-server.

IIRC, the extra process does come from NWChem, but I doubt I am 
oversubscribing the node as it usually should not do much, as mentioned 
before. 

I am still wondering whether that could be a network issue?

Thanks for your comments!

All the best

Jorg


On Wednesday 16 December 2009 04:42:59 beowulf-request at beowulf.org wrote:
> Hi Glen, Jorg
>
> Glen: Yes, you are right about MPICH1/P4 starting extra processes.
> However, I wonder if that is what is happening to Jorg,
> of if what he reported is just plain CPU oversubscription.
>
> Jorg:  Do you use MPICH1/P4?
> How many processes did you launch on a single node, four or five?
>
> Glen:  Out of curiosity, I dug out the MPICH1/P4 I still have on an
> old system, compiled and ran "cpi.c".
> Indeed there are extra processes there, besides the ones that
> I intentionally started in the mpirun command line.
> When I launch two processes on a two-single-core-CPU machine,
> I also get two (not only one) extra processes, in a total of four.
>
> However, as you mentioned,
> the extra processes do not seem to use any significant CPU.
> Top shows the two actual processes close to 100% and the
> extra ones close to zero.
> Furthermore, the extra processes don't use any
> significant memory either.
>
> Anyway, in Jorg's case all processes consumed about
> the same (low) amount of CPU, but ~15% memory each,
> and there were 5 processes (only one "extra"?, is it one per CPU socket?
> is it one per core? one per node?).
> Hence, I would guess Jorg's context is different.
> But ... who knows ... only Jorg can clarify.
>
> These extra processes seem to be related to the
> mechanism used by MPICH1/P4 to launch MPI programs.
> They don't seem to appear in recent OpenMPI or MPICH2,
> which have other launching mechanisms.
> Hence my guess that Jorg had an oversubscription problem.
>
> Considering that MPICH1/P4 is old, no longer maintained,
> and seems to cause more distress than joy in current kernels,
> I would not recommend it to Jorg or to anybody anyway.
>
> Thank you,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------

-- 
*************************************************************
Jörg Saßmannshausen
Research Fellow
University of Strathclyde
Department of Pure and Applied Chemistry
295 Cathedral St.
Glasgow
G1 1XL

email: jorg.sassmannshausen at strath.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html




More information about the Beowulf mailing list