Beowulf help
Srikanth Gururajan
gururajan at cira.wvu.edu
Fri Mar 15 09:37:06 PST 2002
Hello,
Iam trying to build a cluster and currently have two machines hooked up. I
plan to expand the cluster once i have this working.
Hardware Configuration:
1 pentium 200 MHz machine with 48 MB of RAM
1 pentium 100 MHz machine with 48 MB of RAM
Operating system:
RedHat Linux 6.2
I have made the installations on both the machines to be exactly the same.
I then installed the MPICH-1.2.3 on both machines , as a normal user, in
exactly the same directories on both the machines, with exactly the same
options to "./configure"
I have modified the " /etc/hosts.equiv " file to include both machines on
the network. at present I can " rsh " from one machine to another and can
also run the listing from either machine.
I am having trouble in trying to run the "tstmachines" script to test the
availability of the machines for multinode processing and i get errors of
the kind
unexpected response from 192.168.1.1 :
-> /bin/ls : /home/srik/mpich-1.2.3/sbin/mpichfoo : no such file or directory
the explanation that comes along with this says
the " ls " test failed on some machines. this usually means that you do not
have a common file system on all of machines in your machines list; MPICH
requires this for mpirun ( it is possible to handle this in a procgroup
file; see documentation for more details )
other possible problems include :
the remote shell command does not allow you to run " ls "
see documentation about remote shell and rhosts
you have a common file system, but with inconsistent names
see documentation o the automounter fix
I need help on this. I tried to mail the people at anl, but i havent heard
anything from them in 3 days. could someone please help me out on this.
Thanks,
srik.
More information about the Beowulf
mailing list