[Beowulf] problem of mpich-1.2.7p1

Gus Correa gus at ldeo.columbia.edu
Tue Feb 2 17:48:07 PST 2010


Hi Christian

Somehow your program was not attached to the message.

In any case, you didn't say anything about your "machinefile" contents.
You need to list the nodes you want to use there.
The command line will be something like this:

mpirun -np 4 -machinefile my_machinefile canon

"man mpirun" may help you with the details.
(I assume you are using the mpirun that comes with mpich1.)

Having said that, I suggest that you move from MPICH-1 to
OpenMPI or to MPICH2.
MPICH-1 (mpich-1.2.7p1) is old, not maintained or supported anymore,
and often times breaks in current Linux kernels.
The MPICH developers also recommend upgrading to MPICH2.

OpenMPI and MPICH2 are free, easy to install, stable, up to date,
and more efficient than MPICH1.
Upgrading to one of them is likely to avoid more trouble later,
specially with your tight deadline.

See:
http://www.open-mpi.org/
http://www.mcs.anl.gov/research/projects/mpich2/


I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


christian suhendra wrote:
> hello guys
> i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte NFS 
> and RSH..
> i use device=ch_p4,,
> but when i ran my program it's like not working i've got this result :
> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon
> Process 0 of 1 on cluster3
> Total Time: 4.316000 msecs
> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon
> Process 0 of 4 on cluster3
> Total Time: 21.552000 msecs
> Process 2 of 4 on cluster2
> Process 1 of 4 on cluster1
> Process 3 of 4 on cluster1
> root at cluster3:/mirror/mpich-1.2.7p1#
> 
> the process only wotk in 1 node..
> but when i test the machine it connected to all node..
> root at cluster3:/mirror/mpich-1.2.7p1# 
> /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX
> Trying true on cluster1 ...
> Trying true on cluster2 ...
> Trying true on cluster3 ...
> Trying true on cluster4 ...
> Trying ls on cluster1 ...
> Trying ls on cluster2 ...
> Trying ls on cluster3 ...
> Trying ls on cluster4 ...
> Trying user program on cluster1 ...
> Trying user program on cluster2 ...
> Trying user program on cluster3 ...
> Trying user program on cluster4 ...
> 
> i don't know where exactly the problem so that my program cannot run in 
> all node..
> please help me...
> my deadline its about 1 week later...
> i'm very excpeting your help...
> 
> 
> i attached my listing program so you can test on your system
> thank you very much...
> 
> 
> 
> 
> regards
> christian
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list