[Beowulf] distributions

Mon Feb 6 16:07:50 PST 2006

On Fri, 3 Feb 2006, Geoff Jacobs wrote:

> Robert G. Brown wrote:
> > Note well that for most folks, once you get a cluster node installed and
> > working, there is little incentive to EVER upgrade the OS except for
> > administrative convenience. We still have (I'm ashamed to say) nodes
> > running RH 7.3. Why not? They are protected behind a firewall, they
> > are stable (obviously:-), they work.
> 
> Hmm... crispy on the outside, tender on the inside.
> 
> You have to have an OS installed on your nodes for which security
> updates are available

I'll disagree with that point.
The Scyld Beowulf system introduced a unique design that has "slave 
nodes" with nothing installed and no network file system required.
This avoids most security problems, including the need for updates,
because the vulnerable services and programs just aren't there.

Only master nodes have an OS distribution.  The other nodes are 
server slaves or compute slave.  At boot time they are given a kernel by a 
master, queried by the master for their installed hardware, and then given
device drivers to load.  When it's time to start a server or compute 
process, they are told to the versions of libraries and executable for 
the process that they should cache or reuse from a hidden area.

The hidden area is usually a ramdisk file system.  The objects there are 
only libraries and executables, tracked by version information,  so 
that they may be reused and shared.

One way to think of the result is a dynamically generated minimal 
distribution that has only the specific kernel, device drivers,
application libraries and executables needed to run an application or 
service.  But it's even simpler than that, since the node doesn't even 
have initialization and configuration files.

> down. You should know as well as I do that your users are scientists and
> academics, they are not security professionals. They'll pick bad
> passwords, log in from Winblows terminals which have been infected with
> virae, keyloggers, etc. In short, you lower the security of the system
> to that of your least secure user.
> 
> At least if your systems are updated, the chance of an attack escalating
> priviledge and making the situation serious is small. You can give Mr.
> Sloppy The Lecture, restore his files from backup, and be on your merry way.

Or you can have compute nodes with only the application running.  No 
daemons are running to break into, and there are no local 
system configuration files to hide persistent changes e.g. cracks that 
start up after a reboot.

You still have to keep masters updated, but they are few (just enough for 
redundancy) and in an internet server environment they don't need to be 
exposed to the outside world ("firewalled" by the single-purpose slaves 
with zero installations).

A final note: The magic of a dynamic "slave node" architecture isn't a 
characteristic of the mechanism used to start or control processes.  It 
does interact with the process subsystem -- it needs to know what to cache 
(including the specific version) to start up processes accurately.  But the 
other details, such as the library call interface, process control, 
security mechanism, naming and cluster membership are almost unrelated.

Nor does "ramdisk root" give you the magic.  A ramdisk root is part of how 
we implement the architecture, especially the part about not requiring local 
storage or network file systems to work.  (Philosophy: You mount file 
systems as needed for application data, not for the underlying system.)  
But to be fast, efficient and effective the ramdisk can't just be a 
stripped-down full distribution.  You need a small 'init' system and 
a dynamic, version-based caching mechanism.  Otherwise you end up with 
lots of wasted memory, version skew and still have a crippled compute node 
environment.

-- 
Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993