newbie: rsh and pvm problems

Robert G. Brown rgb at phy.duke.edu
Wed Sep 13 05:19:52 PDT 2000


On Tue, 12 Sep 2000, Georgia Southern Beowulf Cluster Project wrote:

> Hello,
> 
> I'm part of a small undergrad team building a poor-man's beowulf out of 
> surplus computers (P200 to P166 for nodes).  Each of the nodes are diskless 
> and use the etherboot package to get a kernel image and boot an NFS-mounted 
> root partition (unique for each node) and NFS-mount a /home partition that 
> is shared across several computers (nodes, a server, and workstations).  Our 
> problem is that we can start pvm on our server, but it will not allow us to 
> add any of the workstations or nodes.  Also, the nodes will not start pvm 
> (saying that a pvmd.<uid> file is present, but it is not there, honest).  
> I've made sure that rsh works across all nodes, server, and workstations in 
> user space and that pvm works on the workstations and the server.  The 
> .rhosts files allow for each computer to access each other without any other 
> authentication.  Additionally, since all computers share a single /home 
> directory, every computer shares the same .rhosts files.  One oddity is that 
> the workstation will add the server under pvm, but not vice versa.  I hope 
> someone can enlighten me and that the info above is specific enough without 
> being too overbearing.  I find it easier to write more instead of less.  All 
> suggestions are welcome.

The latest version of pvm has a debug mode that tells you exactly where
it fails and why.  At a guess, you are failing for one of two reasons.
The most likely one is that key pvm environment variables or paths
aren't set when your server attempts to rsh pvmd on the clients.  This
can easily happen within /bin/bash or /bin/sh because environments are
not passed by rsh and sh init files in /etc are typically not executed
on rsh's either.  The less likely one is that /tmp isn't writable on
your nodes (but is on your server) or that you are making the mistake of
sharing a single writable /tmp across several clients, so a race
condition is created that prevents more than one client from starting
up.  Obviously, every client needs its own writable /tmp.

My personal suggestion is to:

  a) dump rsh (which sucks in so many ways anyway) in favor of ssh,
which is now at last totally legal since RSA jumped the gun and put the
key encryption patent in the public domain.  Duke was anticipating this
and is already moving to totally eliminate rsh and telnet and ftp within
the entire campus network.  ssh is measurably more expensive than rsh,
but the extra expense is almost certainly irrelevant to pvm -- so it
takes you 5 seconds to spawn and start up a large job instead of 1 or 2,
who cares (as long as the large job runs for a few thousand seconds or
more, your marginal cost is way under a percent).  If you use ssh, you
can create /etc/environment and put

PVM_ROOT=/usr/share/pvm3
XPVM_ROOT=/usr/share/pvm3/xpvm
PVM_RSH=/usr/bin/ssh

in it, and these variables will then be set for all users for all ssh
invocations.  You will need to learn to set up ssh so that password-free
ssh works across all clients but that is really not difficult.

  b) Be sure you get a 3.4 pvm revision that is later than (IIRC)
February of this year as it has the new debug features and supports the
PVM_RSH variable.  If you get the 6.2 PowerTools pvm RPM, it has all of
this stuff and creates a stub shell script in /usr/bin to eliminate path
problems.  You still have to create the /etc/environment file or make
sure all users have these variables set in their .???rc file for their
shell of choice.  I'd recommend doing this whether or not you go for
ssh.

  c) If you want to get on the bleeding edge, visit the scyld.com
website (Scyld is also the host of the beowulf.org website) and check
out their "bproc" offering.  This is the beowulf-specific, extremely low
overhead alternative to rsh.  It makes no particular attempt to be
secure (in the sense of encrypting traffic, etc.), but it eliminates
most of the overhead of a remote shell and has lots of potential for
fabulosity.  I suspect that inside a year or two it will evolve into the
glue that converts a pile of PC's into a "true" supercomputer with
something like a unified operating system.  Curiously, when I check out
this website myself I can see links to beostatus and beosetup under
their software link but cannot find bproc itself.  I'm sure it is there
somewhere, though.

   rgb

> 
> Thank you,
> 
> Wes Wells
> 
> <><><><><><><><><><><><><><><><><><>
> Georgia Southern University
> Beowulf Cluster Project
> gscluster at hotmail.com
> 
> _________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
> 
> Share information about yourself, create your own public profile at 
> http://profiles.msn.com.
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list