Running two MPI jobs simultaneously

Mehmet Suzen mehmet.suzen at bristol.ac.uk
Tue Dec 10 08:24:02 PST 2002


Maybe you are using queue system? It sounds like your second program 
waiting in the queue?. If you don't have a queue system, you should have 
it.

--On 10 December 2002 2:56pm +0100 Miska Le Louarn <lelouarn at eso.org> wrote:

> Dear all,
>
> I am facing a strange problem (or "feature") related to either Linux or
> MPI or maybe their interaction.
>
> I have written two programs, in C, which both use MPI.
>
> I run these programs on a 6 node cluster of PCs, each PC running Linux.
> More precise hardware / software description at the end of this mail.
>
> When I run one program (any of the two - with the command mpirun),
> everything goes fine, the program doesn't crash and provides the right
> result. All PCs work happily and everything seems to be ok.
>
> I should say the two programs are completely independant (different
> executables and so on, I don't make any communication between the two...).
>
> BUT when I try to run these two programs at the same time, one of them
> hangs. It just stops doing anything and sits there without crashing until
> the other program is completed. Then it starts to work again.
>
> I am surprised by this behavior. I would have expected that both programs
> run independantly, slower (because they share resources like network and
> CPU) but still run. Now this one program hogs all resources and the other
> one just sits there doing nothing.
>
> I have also tried to run two copies of the first ("hog") process. Now one
> of the copies also freezes completely (but doesn't seem to restart once
> the hog process is finished).
>
> What I am doing now to avoid the problem is to run the programs
> sequencially. It just would be conveniant sometimes to have the progs run
> at the same time - although slower.
>
> I haven't tried running the two programms as two different users. I
> should maybe try that.
>
> So does anybody have any idea why this is ? Is it a Linux scheduler
> "feature" related to the network communication between the nodes (if I
> launch 2 non-MPI jobs, I get the standard slow-down) ? Or maybe
> interference inside MPI between the two processes ?
>
> Any tests I could do to see what is going on ?
>
> Thanks in advance,
>
> Miska
>
> Cluster:
>
> 5 Nodes are Pentium IV, 1.8 GHz, with 1 GB of RAM, running Linux RH 7.3
> (stock kernel) Master: Pentium Xeon 2 CPU, 1GB of RAM, RH 7.3 (stock SMP
> kernel) All machines have a Gigabit network card, and we have a gigabit
> switch.
>
> Software:
> MPICH 1.2.3
> Programs written in C, compiled with gcc 2.96-112 (have also tried gcc
> 3.2 without any change). The programs perform quite of lot of different
> operations, various computations, MPI communications, disk access on the
> local disk etc...
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>





More information about the Beowulf mailing list