[Mailer-Daemon at icantbeli
Art Edwards
edwards at icantbelieveimdoingthis.com
Thu May 24 22:33:43 PDT 2001
I have confirmed that I can place files in /tmp on each node, read from them
and write to other files in /tmp, so, in principle, I can do anything I
want with files.
I would like to point out that in a typical multi processor run, node 0 is
completely ignored. Only when I ask for n+1 nodes, where n is the number of
slave nodes, does node0 activate. How do I assure that node 0 is, well, node 0?
Art Edwards
On Thu, May 24, 2001 at 02:37:28PM -0400, Sean Dilda wrote:
> On Thu, 24 May 2001, Brian C Merrell wrote:
>
> > On Thu, 24 May 2001, Sean Dilda wrote:
> >
> > > Is there any reason the program itself can't run itself in the special
> > > way they want? Anything you can do with rlogin or rsh can be done with
> > > bpsh, except for an interactive shell. However, this can be mimiced
> > > through bpsh. If you can give me some idea of what they are wanting to
> > > do, I might be able to help you find a way to do it without requiring an
> > > interactive shell. Scyld clusters are designed to run background jobs
> > > on all of the slave nodes, not to run login services for users on the
> > > slave nodes.
> > >
> >
> > Hmmm. I guess this warrants some background info.
> >
> > The cluster is not a new cluster. It was previously built by someone else
> > who is now gone. The cluster master node crashed, taking the system and
> > most of their data with it. I am now trying to rebuild the cluster. The
> > cluster previously used RH6.1 stock and followed more of a NOW model than
> > a beowulf model, although all the hardware was dedicated to the cluster,
> > not on people's desks. I'm now trying to use Scyld's distro to bring the
> > cluster back up. I'm pretty happy with it, and managed to get the master
> > node up with a SCSI software RAID array, and a few test nodes up with boot
> > floppies. Seems fine to me. BUT....
> >
> > There are three reasons that they want to be able to rlogin to the
> > machines: 1) first, there are a number of people with independent
> > projects who use the cluster. They are used to being able to simply login
> > to the master, rlogin to a node, and start their projects on one or more
> > nodes, so that they take up only a chunk of the cluster. 2) Also, at
> > least one researcher was previously able to and wants to be able to
> > continue to login to separate nodes and run slightly different (and
> > sometimes non-parallelizable) programs on his data. 3) ALSO, they have
> > code that they would rather not change.
>
> Ok, I understand now. All of these things can be handled with bpsh.
> Do you think these people will be happy with doing something like 'rsh
> <node> <command>' instead of rsh'ing in to get a shell and then run the
> command? If so, you could probablly get away with just symlinking
> /usr/bin/rsh to /usr/bin/bpsh
> >
> > > It is possible to use BProc with a full install on every slave node
> > > however this reduces a lot of the easy administration features we've
> > > trying to put into our distro.
> > >
> >
> > I just set this up, and realize what you mean. I had to statically define
> > IP addresses, users, etc. At first it wasn't a pain, but I realized after
> > the first two that doing all 24 would be. Even though it is now possible
> > to rlogin to different nodes, it wasn't what I was hoping for. I imagine
> > it will be particularly unpleasant when software upgrades need to be
> > performed. :(
>
> This is one of the advantages of our software. It is setup in such a
> way that you don't have to do so much work to keep the slave nodes up to
> date.
>
> >
> > I'm still hoping to find some happy medium, but I'm going to present these
> > options to the group and see what they think. The problem is that they
> > are mathematicians and physicists, not computer people. They really don't
> > want to have to change, even though it seems to be the same.
> >
> > Also one thing I'm still trying to find a solution to: how can the nodes
> > address each other? Previously they used a hosts file that had listings
> > for L001-L024 (and they would like to keep it that way) I guess with the
> > floppy method they don't have to, because the BProc software maps node
> > numbers to IP addresses,
>
> Perhaps you could write some sort of rsh replacement script that turns
> the L001-L024 names into the BProc node numbers, then call bpsh. Would
> that be a happy medium?
----- End forwarded message -----
More information about the Beowulf
mailing list