[Beowulf] Compute Node OS on Local Disk vs. Ram Disk

Wed Oct 1 09:35:37 PDT 2008

On Wed, 1 Oct 2008, Bogdan Costescu wrote:

> On Tue, 30 Sep 2008, Donald Becker wrote:
> > Ahhh, your first flawed assumption.
> > You believe that the OS needs to be statically provisioned to the nodes.
> > That is incorrect.
> Well, you also make the flawed assumption that the best technical 
> solutions are always preferred. From my position I have seen many 
...
> a solution like Scyld's limits the whole cluster to running one 
> distribution (please correct me if I'm wrong), while a solution with 
> node "images" allows mixing Linux distributions at will.

That's correct.  Our model is that a "cluster" is a single system -- and a 
single install.

That's for a good reason: To keep the simplicity and consistency of 
managing a single installation, you pretty much can have... only a single 
installation.

There is quite a bit of flexibility.  The system automatically detects the 
hardware and loads the correct kernel modules.  Nodes can be specialized, 
including mounting different file systems and running different start-up 
scripts.  But the bottom line is that to make the assertion that remote 
processes will run the same as local processes, they have to be running 
pretty much the same system.

If you are running different distributions on nodes, you discard many of 
the opportunities of running a cluster.  More importantly, it's much 
more knowledge- and labor-intensive to maintain the cluster while 
guaranteeing consistency.

> > The only times that it is asked to do something new (boot, accept a 
> > new process) it's communicating with a fully installed, up-to-date 
> > master node.  It has, at least temporarily, complete access to a 
> > reference install.
> 
> I think that this is another assumption that holds true for the Scyld 
> system, but there are situations where this is not true.

Yes, there are scenarios where you want a different model.  But "connected
during important events" is true for most clusters.  We discard the
ability for a node to boot and run independently in order to get the
advantages of zero-install, zero-config consistent compute nodes.

> > If you design a cluster system that installs on a local disk, it's 
> > very difficult to adapt it to diskless blades.  If you design a 
> > system that is as efficient without disks, it's trivial to 
> > optionally mount disks for caching, temporary files or application 
> > I/O.
> 
> If you design a system that is flexible enough to allow you to use 
> either diskless or diskfull installs, what do you have to loose ?

In theory that sounds good.  But historically changing disk-based
installations to work on diskless machines has been very difficult, and
the results unsatisfactory. Disk-based installations want to do selective
installation based on the hardware present, and write/modify many links
and configuration files on installation -- many more than they "need" to.

> The same node "image" can be used in several ways:
> - copied to the local disk and booted from there (where the copying
> could be done as a separate operation followed by a reboot or it can
> be done from initrd)
> - used over NFS-root
> - used as a ramdisk, provided that the node "image" is small enough   

While memory follows the price-down capacity-up curve, we aren't quite to
the point where holding a full OS distribution in memory is negligible.
Most distributions (all the commercially interesting ones) are
workstation-oriented, and the trade-off is "disk is under $1/GB, so we
will install everything".  It's foreseeable that holding an 8GB install 
image in memory will be trivial, but that will be a few years in the 
future, not today.  And we will need better VM and PTE management to make 
it efficient.

-- 
Donald Becker				becker at scyld.com
Penguin Computing / Scyld Software
www.penguincomputing.com		www.scyld.com
Annapolis MD and San Francisco CA