Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] mpirun issue

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Reuti reuti at staff.uni-marburg.de
Tue Oct 21 05:53:26 PDT 2008


Hi,

Am 21.10.2008 um 01:18 schrieb Luis Alejandro Del Castillo Riley:

> hi fellows i have a cluster with 1 master 10 nodes with intel Xeon  
> Quad core.
> Fedora core 6
> PGI 7.0-7
> mpich 1.2.5.2

the last version of MPICH from 2005 is 1.2.7p1. For newer  
installations I would suggest to look into Open MPI.

> machines.x86_64 with a 10 node names

Means only the 10 nodes?

> when i try to run:
>  mpirun -v -arch x86_64  -keep_pg -nolocal -np 9 mm5.mpp
>
> i had no error but when a run with
>  mpirun -v -arch x86_64  -keep_pg -nolocal -np 10 mm5.mpp
>
> they take around 40 min to send me and error :
> bm_list_4667: (1526.781250) wakeup_slave: unable to interrupt slave  
> 0 pid 4666

With so many time, I would suggest to login to all nodes and check with:

$ ps -e f

(f w/o -) the ditribution and startup of the porcesses. Is it doing  
nothing for 40 minutes or running fine until it crashes?

-- Reuti



More information about the Beowulf mailing list