[Beowulf] Performance degrading

Gus Correa gus at ldeo.columbia.edu
Tue Dec 15 17:04:10 PST 2009

Hi Glen, Jorg

Glen: Yes, you are right about MPICH1/P4 starting extra processes.
However, I wonder if that is what is happening to Jorg,
of if what he reported is just plain CPU oversubscription.

Jorg:  Do you use MPICH1/P4?
How many processes did you launch on a single node, four or five?

Glen:  Out of curiosity, I dug out the MPICH1/P4 I still have on an
old system, compiled and ran "cpi.c".
Indeed there are extra processes there, besides the ones that
I intentionally started in the mpirun command line.
When I launch two processes on a two-single-core-CPU machine,
I also get two (not only one) extra processes, in a total of four.

However, as you mentioned,
the extra processes do not seem to use any significant CPU.
Top shows the two actual processes close to 100% and the
extra ones close to zero.
Furthermore, the extra processes don't use any
significant memory either.

Anyway, in Jorg's case all processes consumed about
the same (low) amount of CPU, but ~15% memory each,
and there were 5 processes (only one "extra"?, is it one per CPU socket?
is it one per core? one per node?).
Hence, I would guess Jorg's context is different.
But ... who knows ... only Jorg can clarify.

These extra processes seem to be related to the
mechanism used by MPICH1/P4 to launch MPI programs.
They don't seem to appear in recent OpenMPI or MPICH2,
which have other launching mechanisms.
Hence my guess that Jorg had an oversubscription problem.

Considering that MPICH1/P4 is old, no longer maintained,
and seems to cause more distress than joy in current kernels,
I would not recommend it to Jorg or to anybody anyway.

Thank you,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Glen Beane wrote:
> On 12/15/09 2:36 PM, "Gus Correa" <gus at ldeo.columbia.edu> wrote:
>     If you have single quad core nodes as you said,
>     then top shows that you are oversubscribing the cores.
>     There are five nwchem processes are running.
> It has been a very long time,  but wasn’t that normal behavior for mpich 
> under certain instances?  If I recall correctly it had an extra process 
> that was required by the implementation. I don’t think it returned from 
> MPI_Init, so you’d have a bunch of processes consuming nearly a full CPU 
> and then one that was mostly idle doing something behind the scenes.  I 
> don’t remember if this was for mpich/p4 (with or without 
> —with-comm=shared) or for mpich-gm.
> -- 
> Glen L. Beane
> Software Engineer
> The Jackson Laboratory
> Phone (207) 288-6153
> ------------------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list