Skyld Beowulf/ Diskless nodes /Installation trouble
Senthil Kandasamy
kandas2 at alum.rpi.edu
Wed Jan 2 17:50:38 PST 2002
Hi Guys,
Hopefully someone can help me out.
First of all, I am a Chemical Engineer/Biophysicist who is fairly familiar
with linux.
I am trying to install/fix beowulf on a cluster recently purchased in our
research group.
This cluster was bought before I joined the group and scyld beowulf had
been installed on it (improperly).
Since no one else in our group was interested in parallel computing, no
body had noticed the fact that though one could send computational jobs to
the individual nodes, it could not handle parallel jobs on multiple nodes (
could not connect to host..is the error I get when I mpirun)
We have 1 master +15 diskless nodes, all dual processors.
The Scyld Beowulf (without the support, i.e. the $2 version) has been
installed on it.
However, I suspect that the NFS mounting of the individual nodes has not
been done correctly.
Since I do not have any documentation (could not find any on the
installation disk) on how to setup diskless nodes, I am kind of helpless.
The resources on the net and newsgroups have not been very helpful.
I tried to reinstall the skyld/redhat cd on the cluster, but the setup
process never really seems to be concerned about NFS mounting.
Once the set up is finished, the nodes are up and running and can handle
individual jobs using bpsh.
But I can never connect to the nodes when I try to run a parallel job using
mpirun.
Is there any definitive (and upto date) documentation/howto on how to
install a diskless beowulf cluster?
Any help would be greatly appreciated. It just kills me to ~30 GFlops just
sitting there unutilized while I try to find computer time on other
supercomputers.
Thanks.
Senthil
More information about the Beowulf
mailing list