[Mailer-Daemon at icantbeli
    Art Edwards 
    edwards at icantbelieveimdoingthis.com
       
    Thu May 24 22:33:43 PDT 2001
    
    
  
I have confirmed that I can place files in /tmp on each node, read from them
and write to other files in /tmp, so, in principle, I can do anything I 
want with files.
I would like to point out that in a typical multi processor run, node 0 is
completely ignored. Only when I ask for n+1 nodes, where n is the number of
slave nodes, does node0 activate. How do I assure that node 0 is, well, node 0?
Art Edwards
On Thu, May 24, 2001 at 02:37:28PM -0400, Sean Dilda wrote:
> On Thu, 24 May 2001, Brian C Merrell wrote:
> 
> > On Thu, 24 May 2001, Sean Dilda wrote:
> > 
> > > Is there any reason the program itself can't run itself in the special
> > > way they want?  Anything you can do with rlogin or rsh can be done with
> > > bpsh, except for an interactive shell.  However, this can be mimiced
> > > through bpsh.  If you can give me some idea of what they are wanting to
> > > do, I might be able to help you find a way to do it without requiring an
> > > interactive shell.  Scyld clusters are designed to run background jobs
> > > on all of the slave nodes, not to run login services for users on the
> > > slave nodes.
> > >
> > 
> > Hmmm.  I guess this warrants some background info.
> > 
> > The cluster is not a new cluster.  It was previously built by someone else
> > who is now gone.  The cluster master node crashed, taking the system and
> > most of their data with it.  I am now trying to rebuild the cluster.  The
> > cluster previously used RH6.1 stock and followed more of a NOW model than
> > a beowulf model, although all the hardware was dedicated to the cluster,
> > not on people's desks.  I'm now trying to use Scyld's distro to bring the
> > cluster back up.  I'm pretty happy with it, and managed to get the master
> > node up with a SCSI software RAID array, and a few test nodes up with boot
> > floppies.  Seems fine to me.  BUT....
> > 
> > There are three reasons that they want to be able to rlogin to the
> > machines:  1) first, there are a number of people with independent
> > projects who use the cluster.  They are used to being able to simply login
> > to the master, rlogin to a node, and start their projects on one or more
> > nodes, so that they take up only a chunk of the cluster.  2) Also, at
> > least one researcher was previously able to and wants to be able to
> > continue to login to separate nodes and run slightly different (and
> > sometimes non-parallelizable) programs on his data.  3) ALSO, they have
> > code that they would rather not change.
> 
> Ok, I understand now.  All of these things can be handled with bpsh.
> Do you think these people will be happy with doing something like 'rsh
> <node> <command>' instead of rsh'ing in to get a shell and then run the
> command?  If so, you could probablly get away with just symlinking
> /usr/bin/rsh to /usr/bin/bpsh
> > 
> > > It is possible to use BProc with a full install on every slave node
> > > however this reduces a lot of the easy administration features we've
> > > trying to put into our distro.
> > >
> > 
> > I just set this up, and realize what you mean.  I had to statically define
> > IP addresses, users, etc.  At first it wasn't a pain, but I realized after
> > the first two that doing all 24 would be.  Even though it is now possible
> > to rlogin to different nodes, it wasn't what I was hoping for. I imagine
> > it will be particularly unpleasant when software upgrades need to be
> > performed. :(
> 
> This is one of the advantages of our software.  It is setup in such a
> way that you don't have to do so much work to keep the slave nodes up to
> date.
> 
> > 
> > I'm still hoping to find some happy medium, but I'm going to present these
> > options to the group and see what they think.  The problem is that they
> > are mathematicians and physicists, not computer people.  They really don't
> > want to have to change, even though it seems to be the same.
> > 
> > Also one thing I'm still trying to find a solution to: how can the nodes
> > address each other?  Previously they used a hosts file that had listings
> > for L001-L024 (and they would like to keep it that way) I guess with the
> > floppy method they don't have to, because the BProc software maps node
> > numbers to IP addresses,
> 
> Perhaps you could write some sort of rsh replacement script that turns
> the L001-L024 names into the BProc node numbers, then call bpsh.  Would
> that be a happy medium?
----- End forwarded message -----
    
    
More information about the Beowulf
mailing list