[Beowulf] Error while tstmachines still not solved

akhtar Rasool akhtar_samo at yahoo.com
Mon Dec 27 00:55:08 PST 2004


Hi,

Actually the MPICH is installed on the root (server) node, how would other nodes be able to see the path of mpi binaries…. As u have written, let me know how nodes would be able to see executable program & mpi libraries…

 

Whatever MPI program I m executing it is giving the output but wall clock time is increasing as the –np argument value increase, because the tasks aren’t running on other nodes only on the server….

I m using a 2 node LINUX 9 cluster & MPICH 1.2.5.2 as an MPI………

I have to present my project on 30th December, kindly solve the problem….

 

 

Akhtar


Glen Gardner <Glen.Gardner at verizon.net> wrote:The error in the 5th step is caused by a chatty login message. This makes mpi complain but it ought to work anyway.
You want to turn off motd, and if using freebsd create a file called ".huslogin" and put it in the users home directory.

The next error is to do with paths to mpich and to the program being launched.
All the nodes need to be able to "see" the  mpi binaries and need to be able to see the executable program.
The paths to mpi and the program being launched need to be the same for all nodes and for the root node.
Make sure the path is seutup properly in the environment. You may need to chek your mount points and setup NFS properly.


The last one probably has to do with name resolution.
The root node usually won't need to be in the machines.linux file, but all other nodes need to be.
I believe you need to list machines by hostname, not ip addresses so be sure that both machines have the same hostfile, same .rhosts, etc.

Glen


The next message indicates that the path to the executable "mpichfoo" was not found.  

akhtar Rasool wrote:

After the extraction of MPICH in /usr/local

 

1- tcsh               

2- ./configure –with-comm=shared --prefix=/usr/local

3-  make

4-  make install

5-  util/tstmachines

in the 5th step error was

Errors while trying to run  rsh 192.168.0.25 –n /bin/ls  /usr/local/mpich/mpich-1.2.5.2/mpichfoo     unexpected response from 192.168.0.25

 

n      > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo:

n      no such file or directory

The ls test failed on some machines.

This usually means that u donot have a common filesystem on all of the machines in your machines list; MPICH requires this for mpirun (it is possible to handle this in a procgroup file; see the……)

Other possible problems include:-

The remote shell command rsh doesnot allow you to run ls.

See the doc abt remote shell & rhosts

 

You have common filesystem, but with inconsistent names

See the doc on the automounter fix

1 error were encountered while testing the machines list for LINUX

only these machines seem to be available

host1

 


 

 

    

now since this is only a two node cluster host1 is the server on to which MPICH is being installed. & 192.168.0.25 is the client…..

rsh on both nodes is logging freely…….

On the server side the file    “ machines.LINUX  “ contains   

-192.168.0.25

-host1

Kindly help

   

 

Akhtar




---------------------------------
Do you Yahoo!?
The all-new My Yahoo! – What will yours do? 

---------------------------------
_______________________________________________Beowulf mailing list, Beowulf at beowulf.orgTo change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf  


-- Glen E. Gardner, Jr.AA8CAMSAT MEMBER 10593Glen.Gardner at verizon.nethttp://members.bellatlantic.net/~vze24qhw/index.html



		
---------------------------------
Do you Yahoo!?
 Dress up your holiday email, Hollywood style. Learn more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20041227/0a9102c3/attachment.html>


More information about the Beowulf mailing list