[Beowulf] naked machines
Donald Becker
becker at scyld.com
Thu Sep 22 13:09:34 PDT 2005
On Thu, 22 Sep 2005, massimiliano cialdi wrote:
> is it possible to run acluster in which some machines are "naked"?
> I mean a computer with only mother board (with an integrated NIC to
> boot up), cpu and ram; without any mass storage device (such hard
> disk, floppy or CD), keyboard, mouse and graphic card.
This is an excellent way to run a cluster for some applications.
In my opinion, "diskless administration" is the correct baseline design
for all cluster compute nodes. It does require careful system design to
be effective.
"Diskless administration" is the concept that all diagnostics, hardware
configuration, and initialization is done using only the processor, memory
and network interface. File systems, whether they are from a local disk,
NAS, or network, are selected and mounted to support applications, not the
underlying infrastructure.
An architecture designed around diskless administration is only efficient
and effective if you avoid filling up your memory with needless cruft, and
have few or no configuration files that may need to be updated. We solved
many problems in the Scyld system by treating compute nodes as
"compute slaves".
Slave nodes are provisioned by a master machine. The master machine has a
full installation, with all the usual configuration files, tables, drivers
and services. Compute slaves are set up with only what they need to
accept applications. They don't run services or daemons, or have any
of the other cruft associated with a full installation. This has many
advantages
they don't waste memory space taken by daemons and services
they don't run the lengthy initialization scripts of a full install
(we can provision in under a second!)
they don't need the configuration files (or the administrative effort
of synchronizing) for these services. What does
/bin/ls -R -f -1 /etc | wc
report for your full installation?
big memory applications run faster
the virtual-to-physical mapping is nearly unfragmented from boot
there is a much better chance of using 4MB page
fewer TLB misses improves performances, sometimes dramatically
with only applications running, memory _stays_ unfragmented
Now to make this work effectively, you need some additional subsystems.
The most obvious is starting up applications: you need to make certain
that the required executables and libraries exist on the compute slave.
But this problem is also an opportunity: the same mechanism can verify
that you are using the correct version (which might not be "current"!) of
the libraries and executable. In Scyld we have evolved to using a
separate subsystem that caches libraries and executables as whole files,
with additional version information. By caching whole files we eliminate
locking, make version tracking easier, never encounter "page-in" failures,
and can continue to run even when the originating machine can't be
reached. It also means that applications have predictable performance,
always running at full speed from a local copy.
There are a bunch of other mechanisms, but this message is already getting
pretty long.
Random Background Info
The original diskless architecture was introduced by Sun in the mid-1980s.
Once network booting was complete, NFS was used to provide all
subsequent files: system utilities, configuration files, applications and
user home directories. Other than the mount table, local system
operation was unchanged from a full installation.
A related design was "dataless", where the system infrastructure and
scratch space is held on local disks, but the user's home directory and
perhaps applications are mounted with NFS.
Diskless operation with NFS root was very clever and innovative. Most
system architects hadn't noticed that machines had gotten fast enough that
paging over a network was feasible. NFS root saved the considerable cost
of a disk on each machine, as well as reducing the overall disk space
needed.
Having a single point of updates was considered a secondary effect, and
was more of a problem than an opportunity. NFS was conceived when
the file servers were being debugged, and thus was designed to be
stateless ("idempotent") to continue operation through frequent file
server crashes. (NFS won out over RFS, a contemporary network file system,
in part because of this connectionless model.) This meant that updating
an in-use executable or library risked that a client still using an old
version would be left with no choice but to hang or crash the application.
The introduction of "dataless workstations" was considered a major
improvement. For workstation operation "dataless" is much more efficient
and reliable. The NFS server traffic is considerably reduced: most
executables and libraries files are served from local disks. Paging
executables from local files also means better decisions can be made about
page caching, while avoiding the un-handleable failure when a NFS server
is temporarily unreachable. Applications can potentially handle error
from open(), read() or write(), but they don't even get to see a failure
to page in the next instruction.
--
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf
mailing list