[Beowulf] Re: Performance degrading
Jörg Saßmannshausen
jorg.sassmannshausen at strath.ac.uk
Wed Dec 16 01:41:39 PST 2009
Hi guys,
ok, some more information.
I am using OpenMPI-1.2.8 and I only start 4 processes per node. So my hostfile
looks like that:
comp12 slots=4
comp18 slots=4
comp08 slots=4
And yes, one process is the idle one which does things in the background. I
have observed similar degradions before with a different program (GAMESS)
where in the end, running a job on one node was _faster_ then running it on
more than one nodes. Clearly, there is a problem here.
Interesting to note that the fith process is consuming memory as well, I did
not see that at the time when I posted it. That is somehow odd as well, as a
different calculation (same program) does not show that behaviour. I assume
it is one extra process per job-group which will act as a master or shepherd
for the slave processes. I know that GAMESS (which does not use MPI but ddi)
has one additional process as data-server.
IIRC, the extra process does come from NWChem, but I doubt I am
oversubscribing the node as it usually should not do much, as mentioned
before.
I am still wondering whether that could be a network issue?
Thanks for your comments!
All the best
Jorg
On Wednesday 16 December 2009 04:42:59 beowulf-request at beowulf.org wrote:
> Hi Glen, Jorg
>
> Glen: Yes, you are right about MPICH1/P4 starting extra processes.
> However, I wonder if that is what is happening to Jorg,
> of if what he reported is just plain CPU oversubscription.
>
> Jorg: Do you use MPICH1/P4?
> How many processes did you launch on a single node, four or five?
>
> Glen: Out of curiosity, I dug out the MPICH1/P4 I still have on an
> old system, compiled and ran "cpi.c".
> Indeed there are extra processes there, besides the ones that
> I intentionally started in the mpirun command line.
> When I launch two processes on a two-single-core-CPU machine,
> I also get two (not only one) extra processes, in a total of four.
>
> However, as you mentioned,
> the extra processes do not seem to use any significant CPU.
> Top shows the two actual processes close to 100% and the
> extra ones close to zero.
> Furthermore, the extra processes don't use any
> significant memory either.
>
> Anyway, in Jorg's case all processes consumed about
> the same (low) amount of CPU, but ~15% memory each,
> and there were 5 processes (only one "extra"?, is it one per CPU socket?
> is it one per core? one per node?).
> Hence, I would guess Jorg's context is different.
> But ... who knows ... only Jorg can clarify.
>
> These extra processes seem to be related to the
> mechanism used by MPICH1/P4 to launch MPI programs.
> They don't seem to appear in recent OpenMPI or MPICH2,
> which have other launching mechanisms.
> Hence my guess that Jorg had an oversubscription problem.
>
> Considering that MPICH1/P4 is old, no longer maintained,
> and seems to cause more distress than joy in current kernels,
> I would not recommend it to Jorg or to anybody anyway.
>
> Thank you,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
--
*************************************************************
Jörg Saßmannshausen
Research Fellow
University of Strathclyde
Department of Pure and Applied Chemistry
295 Cathedral St.
Glasgow
G1 1XL
email: jorg.sassmannshausen at strath.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the Beowulf
mailing list