[Beowulf] naked machines

Thu Sep 22 13:09:34 PDT 2005

On Thu, 22 Sep 2005, massimiliano cialdi wrote:

> is it possible to run acluster in which some machines are "naked"?
> I mean a computer with only mother board (with an integrated NIC to
> boot up), cpu and ram; without any mass storage device (such hard
> disk, floppy or CD), keyboard, mouse and graphic card.

This is an excellent way to run a cluster for some applications.

In my opinion, "diskless administration" is the correct baseline design 
for all cluster compute nodes.  It does require careful system design to 
be effective.

"Diskless administration" is the concept that all diagnostics, hardware 
configuration, and initialization is done using only the processor, memory 
and network interface.  File systems, whether they are from a local disk, 
NAS, or network, are selected and mounted to support applications, not the 
underlying infrastructure.

An architecture designed around diskless administration is only efficient 
and effective if you avoid filling up your memory with needless cruft, and 
have few or no configuration files that may need to be updated.  We solved 
many problems in the Scyld system by treating compute nodes as 
"compute slaves".

Slave nodes are provisioned by a master machine.  The master machine has a 
full installation, with all the usual configuration files, tables, drivers 
and services.  Compute slaves are set up with only what they need to 
accept applications.  They don't run services or daemons, or have any 
of the other cruft associated with a full installation.  This has many 
advantages
  they don't waste memory space taken by daemons and services
  they don't run the lengthy initialization scripts of a full install
    (we can provision in under a second!)
  they don't need the configuration files (or the administrative effort
      of synchronizing) for these services.  What does
        /bin/ls -R -f -1 /etc | wc
      report for your full installation?
  big memory applications run faster
     the virtual-to-physical mapping is nearly unfragmented from boot
     there is a much better chance of using 4MB page
     fewer TLB misses improves performances, sometimes dramatically
     with only applications running, memory _stays_ unfragmented

Now to make this work effectively, you need some additional subsystems.  

The most obvious is starting up applications: you need to make certain 
that the required executables and libraries exist on the compute slave.  
But this problem is also an opportunity: the same mechanism can verify 
that you are using the correct version (which might not be "current"!) of 
the libraries and executable.  In Scyld we have evolved to using a 
separate subsystem that caches libraries and executables as whole files, 
with additional version information.  By caching whole files we eliminate
locking, make version tracking easier, never encounter "page-in" failures, 
and can continue to run even when the originating machine can't be 
reached.  It also means that applications have predictable performance, 
always running at full speed from a local copy.

There are a bunch of other mechanisms, but this message is already getting 
pretty long.

Random Background Info

The original diskless architecture was introduced by Sun in the mid-1980s.  
Once network booting was complete, NFS was used to provide all 
subsequent files: system utilities, configuration files, applications and 
user home directories.  Other than the mount table, local system 
operation was unchanged from a full installation.

A related design was "dataless", where the system infrastructure and 
scratch space is held on local disks, but the user's home directory and 
perhaps applications are mounted with NFS.

Diskless operation with NFS root was very clever and innovative.  Most 
system architects hadn't noticed that machines had gotten fast enough that 
paging over a network was feasible.  NFS root saved the considerable cost 
of a disk on each machine, as well as reducing the overall disk space 
needed.

Having a single point of updates was considered a secondary effect, and
was more of a problem than an opportunity.  NFS was conceived when
the file servers were being debugged, and thus was designed to be
stateless ("idempotent") to continue operation through frequent file
server crashes.  (NFS won out over RFS, a contemporary network file system,
in part because of this connectionless model.)  This meant that updating
an in-use executable or library risked that a client still using an old
version would be left with no choice but to hang or crash the application.

The introduction of "dataless workstations" was considered a major 
improvement.  For workstation operation "dataless" is much more efficient 
and reliable.  The NFS server traffic is considerably reduced: most 
executables and libraries files are served from local disks.  Paging 
executables from local files also means better decisions can be made about 
page caching, while avoiding the un-handleable failure when a NFS server 
is temporarily unreachable.  Applications can potentially handle error 
from open(), read() or write(), but they don't even get to see a failure 
to page in the next instruction.

-- 
Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993