Why no rlogin to nodes?

Daniel Ridge newt at scyld.com
Mon Oct 16 11:45:04 PDT 2000


On Mon, 16 Oct 2000, Robert G. Brown wrote:

> Scyld Beowulf looks great, and I got MY CD for $2 at ALSC to try out in
> the next day or two;-) but it is a "beowulf in a box" for TRUE beowulfs
> -- head master node, headless slave nodes, isolated/protected network,
> no internal security.  It doesn't look like it is going to work for
> NOW/COW type arrangments or for heterogenous clusters with a protected
> part and an unprotected part, although I'm not yet certain about the
> latter statement, since the head node might be able to be set up to use
> PVM or MPI across both the internal nodes and the external nodes.

PVM is already able to connect a number of MPPs into a single virtual
machine. We could think of a machine running Scyld Beowulf as an MPP.
 
> think that this would require a PVM hack (one can set e.g. PVM_RSH to
> bpsh OR ssh, but I'm not sure one could set it to either/both without
> making it a node-specific identifier rather than a virtual
> machine-specific environment variable, but I'm not certain.

If you have an environment where it would make sense to mix and
match ssh/bpsh, you could certianly write a wrapper script called
'foosh' that looked at the hostname and decided for you based on the
hostname which to use. Scyld's bpsh expects digit node names -- which
are otherwise not valid hostnames. The notional 'foosh' could notice
this on the command line and choose the right method.

For that matter, Scyld Beowulf doesn't prevent you from using
non-private IP addresses for cluster nodes and enabling IP forwarding
on the frontend -- if this suits your application.

With respect to PVM (and similar environments), there is a
complexity beyond direct substitution of 'bpsh' for 'rsh'-alikes.

PVM for Linux expects to use 'rsh' to spawn a copy of the PVM
daemon on each node. This daemon is then responsible for fork()ing
and exec()ing a copy of the child process.

Under Scyld Beowulf, a set of PVM daemons running on all the nodes
probably wouldn't even have access to the binaries -- we normally
use bproc to send a copy of the frontend binary over to the target node.

For PVM, you would need to use a PVM architecture that resembled the
work that Paul Springer used in constructing the 'BEOLIN' target.
In this way, the single copy of the PVM daemon could 'bpsh' in place
of 'rsh' in the expected way.

I worked with an older version of Paul's patch. Alas, I was unable to
get it to run on the systems we have here. It was my intention to create
a related PVM arch that used our libraries to also determine the size of
the machine, etc.

In a related vein, PBS has some of the same issues with our platform.
PBS expects to be able to run a 'mom' process on each node and expects
that this 'mom' process will be able to spawn the remote processes.

Under PBS, our process spawning and job control mechanism most closely
resembles the T3E build target -- and we feel that a target like this
is the most elegant way to support PBS.

If any of you maintain a system that has this nature and you wish to have
it work cleanly with Scyld Beowulf, here are a couple of pointers:

	* You may wish to consider an architecture target which is
	  not the target you would otherwise use to support a collection
	  of Linux boxes. Again, in the PBS case, our model is closer to
	  a T3E than a NOW. You may be able to glue two arches together
	  simply.

	* If your software demands a collection of remote daemons
	  that are each capable of running jobs local to them -- you
	  can take advantage (at some expense) of a BProc technique.
	  In place of the ultimate fork()/execve() pair, you might
	  do something like:
	
		{
		int currnode;
		int child_pid;

			currnode=bproc_currnode();
			if(currnode != -1)
			{
				child_pid=bproc_rfork(-1);
				if(child_pid == 0)
				{
					bproc_execmove(currnode,*argv,argv,
						environ);
				}
			} else { /* we're already on the frontend */
				child_pid=fork();
				if(child_pid == 0)
				{
					execve(*argv,argv,environ);
				}
			}
		}

	Where I have taken liberties with error checking and completness
	for the sake of clarity.

	This technique is sort of what I call the 'BProc slingshot'.
	We swing back to the frontend (node -1 in Bproc parlance) to
	make a resource available and then swing back to use it.
	I rarely suggest this approach for new application development --
	but it can often be inserted into an existing application with
	little trouble.

	We use this technique, for instance, in our internal-use
	version of modutils. We can allow remote kernels to demand-load
	kernel modules by telling the modprobe program to move back
	to the frontend to get the actual kernel module and then
	swing back to the node to insert it. This means more
	remote process operations -- but they are fast and
	allow us to recycle the entire modutils package with
	a small and elegant patch.

Regards,	
	Dan Ridge
	Scyld Computing Corporation





More information about the Beowulf mailing list