[Beowulf] motherboards for diskless nodes

Sun Feb 27 23:10:38 PST 2005

On Sat, 26 Feb 2005, Reuti wrote:

> Quoting John Hearns <john.hearns at streamline-computing.com>:
> 
> > On Fri, 2005-02-25 at 10:31 -0800, Greg Lindahl wrote:
> > 
> > > 
> > > Doesn't make any sense; I have seen people describe such systems where
> > > they download a disk image when a batch job wants a different software
> > > load. It's certainly doable that way: it does have different tradeoffs
> > > from the diskless case, but if it gives you a headache, it's probably
> > 
> > I've always dreamed of using User Mode Linux images for this.
> > In a Grid-based world, prepare a UML instance which has all the
> > libraries and runtime to run your code. Ship it across the grid with
> > your executable. 
> > The cluster at the receiving end can be running any distribution - it
> > runs your UML in a sandbox.
> 
> I would like to have it also: if any queuing system wants to kill a job on a 
> node: just shutdown the virtual machine. And you also get off of any semaphores 
> and shared memory segments (and message queues), which maybe left behind in 
> other cases. I saw leftover semaphores not only on Linux, but also on AIX and 
> SuperUX in case of a job abort. Is there any safe way to release them after a 
> job? I already got the idea, to catch them with a library which wraps the 
> shmget(),.. calls by using LD_PRELOAD to get the IDs, and then release them in 
> an epilog after the jobs (seems working, but of course only for dynamically 
> linked applications).
> 
> Just got the hint to look at Meiosys. Seems they have such features in their 
> virtual machines.

Another place to look for stuff not unlike this is the COD project at
Duke.  Except that with COD the "sandbox" is the whole computer.  If
your application needs a specific operating system or resource
collection, you just prepare an appropriate image and boot the cluster
(diskless or not) into that image long enough to run the application,
then boot it back into something else.

Clumsy as this sounds (and obviously overkill for certain classes of
things) it has some significant advantages to consider.  In addition to
very definitely having all the right libraries and resources there is
security -- not to worry, the images you load contain YOUR account
information and authentication information, so when your reboot into
something else later, the entire system is taken down.  Booting up a
cluster into a new image can take as little as a few minutes, which is
no big deal if the task will run for days.  It eliminates the need for
significant virtualization or something like vmware.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu