clustering both linux and unix..

Robert G. Brown rgb at phy.duke.edu
Tue Oct 16 06:15:36 PDT 2001


On Tue, 16 Oct 2001, Senol Tekdal wrote:

>
>   can i set up a cluster that work on both linux and unix?

What exactly do you mean by this?  Linux "is" a form of unix.  Do you
mean with linux and e.g. freebsd and/or solaris and/or some other unix
variant?

If so, the answer is sure, easily.  However, the cluster won't be a
"true beowulf" unless you work very hard writing new glueware, since
true beowulf software like the Scyld distribution is mostly based on the
assumption of linux homogeneity.

However, heterogeneous clusters long predate linux itself -- the PVM
(Parallel Virtual Machine) library was designed from the beginning to
glue together heterogeneous hosts (both different hardware and different
operating systems) into a single "parallel supercomputer" where one had
merely to build the key binaries once on each architecture represented
and save them in certain standard locations in your pvm directory
heirarchy.  PVM is available prebuilt in all the major linux
distributions.  PVM is also either available prebuilt or easily built
from source for pretty much all the other unices.  It can probably even
be built on WinXX platforms that have a real network stack, e.g. NT or
2K, so one could probably even set up a parallel/cluster computation
that spanned linux, freebsd, macos x, WinNT, solaris, and more.  I
personally don't work much with MPI, but I'd guess that current MPI
releases will also work across heterogeneous clusters with the right
communications channel.  Embarrassingly parallel and script-based
distribution of simple independent threads (like RC5 or SETI) can also
easily be run on a heterogeneous cluster.

There are problems to overcome, of course.  Heterogeneous clusters
(especially highly heterogeneous clusters) that aren't master/slave
compute farms running what are >>effectively<< embarrassingly parallel
computations are much harder to program efficiently than homogeneous
clusters.  By this I mean that if your computation is synchronous and
advances on each node up to a barrier (where nodes have to "talk" to one
another before the computation proceeds) then you have to work quite
hard to split the task up so that the time each node subtask takes to
complete is the same on all nodes if the nodes are different hardware
running different OS's.

You also have to deal with stability problems -- a distributed, tightly
coupled computation with no checkpointing very likely will have to be
restarted if any node goes down in mid-calculation.  If you are using
128 nodes and 64 of them are running WinXX and the computation runs for
a whole day before it finishes with a lot of memory management going on,
your odds of EVER completing the work are very slender.  This isn't just
picking on Windows (however satisfying it is to do so;-), if you split
it up across ANY two or three or four OS's, you are subjecting yourself
to weak-link instability across all of the choices.  One bad memory
manager, one buggy communications stack, and your whole computation goes
down the tubes.

A linux-only cluster has the advantage of being immensely stable -- we
are currently running nodes from OS installation/upgrade to OS
installation/upgrade with no non-hardware related failures in between,
even nodes running NIS (which "works" for us as we run mostly EP code on
<100 nodes so far) since we started rebooting the NIS servers
therapeutically to deal with the NIS memory leak.

Glancing at our two clusters, one has been up for 73 days (except for
one node which I've used to demonstrate kickstart node installs in the
last week to visitors from a nearby school) and two nodes that were
"busy" during the last OS upgrade that were upgraded late).  The other
cluster has been up 21 days, which is also an upgrade epoch (we upgrade
them separately to new RH kernels as they appear in the RH updates
directory).  If it weren't for the soon-come RH 7.2 (which we've been
running in beta on selected hosts and are kickstart-ready to upgrade
when they finally release it) I'm confident that I could run those nodes
"indefinitely" -- very possibly until they break or some act of God like
a power failure or nuclear war takes them down.

There may be some other Unix variants out there that can boast similar
stability, but not many.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list