[Beowulf] Compute Node OS on Local Disk vs. Ram Disk

Wed Oct 1 05:52:29 PDT 2008

On Tue, 30 Sep 2008, Donald Becker wrote:

> Ahhh, your first flawed assumption.
>
> You believe that the OS needs to be statically provisioned to the nodes.
> That is incorrect.

Well, you also make the flawed assumption that the best technical 
solutions are always preferred. From my position I have seen many 
cases where political or administrative reasons have very much 
restricted the choice of technical solutions that could be used. Other 
reasons are related to the lack of flexibility from ISVs which provide 
applications in binary form only and make certain assumptions about 
the way the target cluster works. Yet another reason is the fact that 
a solution like Scyld's limits the whole cluster to running one 
distribution (please correct me if I'm wrong), while a solution with 
node "images" allows mixing Linux distributions at will.

> The only times that it is asked to do something new (boot, accept a 
> new process) it's communicating with a fully installed, up-to-date 
> master node.  It has, at least temporarily, complete access to a 
> reference install.

I think that this is another assumption that holds true for the Scyld 
system, but there are situations where this is not true. Some years 
ago I have developed a rudimentary batch system for which the master 
node only contacted the first node allocated/desired for the job; this 
node was then responsible to contact the other nodes allocated/desired 
and start the rest of the job. This was very much modelled after the 
way the naive rsh/ssh based launchers for MPI jobs work: once mpirun 
is running, there is no connection to the master node, only between 
the node where mpirun is running and the rest of the nodes specified 
in the hosts file. I think that Torque also has a similar design 
(Mother Superior being in control of the job), but I haven't look 
closely at the details so I might be wrong.

> If you design a cluster system that installs on a local disk, it's 
> very difficult to adapt it to diskless blades.  If you design a 
> system that is as efficient without disks, it's trivial to 
> optionally mount disks for caching, temporary files or application 
> I/O.

If you design a system that is flexible enough to allow you to use 
either diskless or diskfull installs, what do you have to loose ?
The same node "image" can be used in several ways:
- copied to the local disk and booted from there (where the copying 
could be done as a separate operation followed by a reboot or it can 
be done from initrd)
- used over NFS-root
- used as a ramdisk, provided that the node "image" is small enough

Note: I have used "image" in this and previous e-mails to signify the 
collection of files that the node needs for booting; most likely this 
is not a FS image (like an ISO one), but it could also be one. Various 
documents call this a "virtual node FS", "chroot-ed FS", etc.

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de