[Beowulf] A Good Linux Distribution to Start with?

Thu Sep 9 07:55:01 PDT 2004

On Wed, 8 Sep 2004, Dragovich, Jeff wrote:

> I am at the point in my beowulf cluster construction where I need to
> pick the Linux distribution to use. I have a small cluster (10 nodes,
> CISCO switch, a single control machine). The cluster will support
> parallelizing/benchmarking a finite element program using MPI. I am
> currently the only prospective user, and don't need sendmail and a bunch
> of that stuff. Just dev tools. 
>  
> Any comments on which Linux flavor to start with? I've read some jabs at
> Fedora. Can't find a FAQ (after about 4 hours of searching) that really
> discusses the pros and cons of each Linux variant related to Beowulf
> clusters. I know religion is a hot topic, but please don't flame the
> agnostic. :)
>  
> Jeff

I don't think this is addressed not because it gets religious but
because you can do perfectly satisfactory clustering with any
distribution.  Many or even most distributions support at least basic
clustering (with PVM and MPI) right in the distribution itself, so it is
just a matter of selecting the packages for installation and then
learning to use them.  Of course all of them support raw network
programming at the socket level.  Higher end cluster tools are often
also available for many of the distributions or are at worst a rebuild
away.

Fedora core 1 had issues, but FC2 is working pretty well for us here,
both on desktops and (so far) on cluster nodes including opterons.

Centos (logo-free RHEL rebuild, stays within hours to days of RHEL at
the logoless package level) should work as well as RHEL, obviously, and
Red Hat itself (if you don't mind paying them on a per-node basis at
fairly absurd rates) has always been a decent package to use for
clustering.  SuSE ditto -- lots of turnkey vendors use SuSE as a basis.
Mandrake ditto -- it has its (IIRC) "CLIC" cluster-specific packaging.
In both of these latter two cases there are again issues of licensing
and charges on a per node or per cluster basis.  On the non-RPM-based
front, Debian is totally open and free and is certainly used in
clusters.  On the non-linux (but still totally open source) front,
freebsd is used successfully in clusters.

There are a number of so-called "cluster distributions" to choose from
as well.  OSCAR is an older one that I'm not sure is still being loved
by anyone.  ROCKS is a newer one, built on top of (IIRC) a RH 9 (?)
release and maybe moving towards centos or FC?  CLIC I mentioned.  Scyld
is a commercial but very powerful and well-supported "beowulf in a box"
distribution, I believe derived from a RH variant these days but am not
sure.  Scyld can cost a lot for full support and everything, but for
somebody doing what you are doing (basically learning/playing more or
less out of pocket) they might give you a significant break.
Clustermatic/bproc is a way of getting a lot of what scyld offers in
a fully open source DIY way.  I'm probably leaving a bunch out -- 
nice diskless cluster projects, smaller and lesser known linux variants
(which would still work), Caosity...

So you have a plethora of choices, and I'm not about to tell you which
one is "best" as the answer is none of them -- they are mostly pretty
good, with various constellations of advantages and disadvantages.

Not to let any good opportunity for editorializing pass, though...

...my major beef with most of the cluster distributions is that they
really require one to bend the simplicity and scalability and
customizability of repository-based, package-based installation and
maintenance schema.

In my opinion, the "best" way to install a cluster is from a repository
via PXE and something like kickstart if not kickstart itself, where the
only thing that differentiates a cluster node from a desktop workstation
is the selection of packages installed and some post-install
configuration.  An acceptable variant of this would be the newer
diskless cluster approaches, provided that the exported/cloned node
image is package-level controllable and can be kept up to date relative
to a well-maintained mirror tree of repositories with a tool like yum or
apt.

This opinion extends down to some of the best known cluster packages,
many of which are still distributed via tarball and #ifdef'd to hell and
back or worse, built on top of evil such as e.g. aimk so they'll build
on every single variant of Unixoid or non-Unixoid operating system known
to mankind.  Tarball distribution (except to hackers or people working
on the code) is Evil.  Heavy code instrumentation to cope with non-(e.g.
posix)-compliant operating systems, ancient operating system, commercial
operating systems with non-open or non-compliant libraries is Evil.
Proper packaging is Good.  Compliance with standards (to the point where
one has a clean build on an ANSI/POSIX system) is Very Good.  These
things make it >>easy<< to move a package between linux distributions
and permit linux distributions to be built and rebuilt without breaking
like hell all over the place.

RPM isn't perfect, but it isn't bad and it is in wide use and has smart
people actively working on it to improve it further.  APT is similarly
strongly driven by smart people, and the religious and technical
differences between the two simply serve to keep both on their mettle
ina competitive and evolutionary world (where in open source they can
easily steal the best ideas of their competitors until one day maybe
they converge -- or not).  Both are adequate as a basis of source level
distribution of entire packages that can be easily rebuilt for specific
distributions and repositories and purposes.  Alas some very useful
cluster tools continue to eschew packaging (which would SIMPLIFY their
build process and help them tremendously to debug their code and make it
fully functional) and continue to waste energy getting their stuff not
only to build, but to build after each little debugging change, from
tarball, on thirty distinct systems, twenty of which are FUBAR at the
library level and remain broken anyway.

So (editorializing done) -- take a look at some of the stuff mentioned
above or linked to the various main linux clustering sites.  If you want
the "simplest" approach and are already experienced with some linux
distro, just use that distro as a base and install clustering packages
and get started that way fairly painlessly.  If you want full automation
(and to devote a fair bit of time learning to use it) look hard at stuff
like ROCKS.

Hope this helps,

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu