Beowulf help

Srikanth Gururajan gururajan at cira.wvu.edu
Fri Mar 15 09:37:06 PST 2002


Hello,
Iam trying to build a cluster and currently have two machines hooked up. I 
plan to expand the cluster once i have this working.

Hardware Configuration:
1 pentium 200 MHz machine with 48 MB of RAM
1 pentium 100 MHz machine with 48 MB of RAM

Operating system:
RedHat Linux 6.2

I have made the installations on both the machines to be exactly the same.
I then installed the MPICH-1.2.3 on both machines , as a normal user, in 
exactly the same directories on both the machines, with exactly the same 
options to "./configure"

I have modified the  " /etc/hosts.equiv "  file to include both machines on 
the network. at present I can " rsh " from one machine to another and can 
also run the listing from either machine.
I am having trouble in trying to run the "tstmachines" script to test the 
availability of the machines for multinode processing and i get errors of 
the kind

unexpected response from 192.168.1.1 :
-> /bin/ls : /home/srik/mpich-1.2.3/sbin/mpichfoo : no such file or directory

the explanation that comes along with this says

the " ls " test failed on some machines. this usually means that you do not 
have a common file system on all of machines in your machines list; MPICH 
requires this for mpirun ( it is possible to handle this in a procgroup 
file; see documentation for more details )

other possible problems include :
the remote shell command does not allow you to run " ls "
see documentation about remote shell and rhosts

you have a common file system, but with inconsistent names
see documentation o the automounter fix


I need help on this. I tried to mail the people at anl, but i havent heard 
anything from them in 3 days. could someone please help me out on this.

Thanks,
srik.




More information about the Beowulf mailing list