[Beowulf] problem of mpich-1.2.7p1

Gus Correa gus at ldeo.columbia.edu
Tue Feb 2 18:31:49 PST 2010


Hi Christian

What is the content of your file
/mirror/mpich-1.2.7p1/share/machines.LINUX?

Please send it on your next message, it may clarify.


It looks like to me that your program is working correctly.
(I am guessing a bit, because you didn't send the source code.)

When you did "mpirun -np 1 canon"
it ran one process on cluster3:
See:

 >>> Process 0 of 1 on cluster3
 >>> Total Time: 4.316000 msecs

When you did "mpirun -np 4 canon"
it ran two processes on cluster1, and one in cluster2 and cluster3.

See:

 >>> Process 0 of 4 on cluster3
 >>> Total Time: 21.552000 msecs
 >>> Process 2 of 4 on cluster2
 >>> Process 1 of 4 on cluster1
 >>> Process 3 of 4 on cluster1

Did you expect more output than this?
Did you expect a different output?
Did you expect it to use a different set of computers?


Anyway, you would be better off upgrading to OpenMPI or MPICH2.
The README file in the OpenMPI tarball has all information you
need to install it.
Chances are that MPICH1 will break in more complicated programs.

And remember not to run user-level programs as root.
That's not really safe.

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Gus Correa wrote:
> PS - And don't run the programs as root!
> 
> Gus Correa
> 
> Gus Correa wrote:
>> Hi Christian
>>
>> Somehow your program was not attached to the message.
>>
>> In any case, you didn't say anything about your "machinefile" contents.
>> You need to list the nodes you want to use there.
>> The command line will be something like this:
>>
>> mpirun -np 4 -machinefile my_machinefile canon
>>
>> "man mpirun" may help you with the details.
>> (I assume you are using the mpirun that comes with mpich1.)
>>
>> Having said that, I suggest that you move from MPICH-1 to
>> OpenMPI or to MPICH2.
>> MPICH-1 (mpich-1.2.7p1) is old, not maintained or supported anymore,
>> and often times breaks in current Linux kernels.
>> The MPICH developers also recommend upgrading to MPICH2.
>>
>> OpenMPI and MPICH2 are free, easy to install, stable, up to date,
>> and more efficient than MPICH1.
>> Upgrading to one of them is likely to avoid more trouble later,
>> specially with your tight deadline.
>>
>> See:
>> http://www.open-mpi.org/
>> http://www.mcs.anl.gov/research/projects/mpich2/
>>
>>
>> I hope this helps,
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> christian suhendra wrote:
>>> hello guys
>>> i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte 
>>> NFS and RSH..
>>> i use device=ch_p4,,
>>> but when i ran my program it's like not working i've got this result :
>>> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon
>>> Process 0 of 1 on cluster3
>>> Total Time: 4.316000 msecs
>>> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon
>>> Process 0 of 4 on cluster3
>>> Total Time: 21.552000 msecs
>>> Process 2 of 4 on cluster2
>>> Process 1 of 4 on cluster1
>>> Process 3 of 4 on cluster1
>>> root at cluster3:/mirror/mpich-1.2.7p1#
>>>
>>> the process only wotk in 1 node..
>>> but when i test the machine it connected to all node..
>>> root at cluster3:/mirror/mpich-1.2.7p1# 
>>> /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX
>>> Trying true on cluster1 ...
>>> Trying true on cluster2 ...
>>> Trying true on cluster3 ...
>>> Trying true on cluster4 ...
>>> Trying ls on cluster1 ...
>>> Trying ls on cluster2 ...
>>> Trying ls on cluster3 ...
>>> Trying ls on cluster4 ...
>>> Trying user program on cluster1 ...
>>> Trying user program on cluster2 ...
>>> Trying user program on cluster3 ...
>>> Trying user program on cluster4 ...
>>>
>>> i don't know where exactly the problem so that my program cannot run 
>>> in all node..
>>> please help me...
>>> my deadline its about 1 week later...
>>> i'm very excpeting your help...
>>>
>>>
>>> i attached my listing program so you can test on your system
>>> thank you very much...
>>>
>>>
>>>
>>>
>>> regards
>>> christian
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit 
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list