[Beowulf] best linux distribution

Robert G. Brown rgb at phy.duke.edu
Mon Oct 8 13:35:00 PDT 2007


On Mon, 8 Oct 2007, Mike Davis wrote:

> Robert G. Brown wrote:
>> On Mon, 8 Oct 2007, Mike Davis wrote:
>> 
>>> My experience is similar to Bill's. We've been using CentOs 3,4 for the 
>>> past few years on our larger clusters. It is a good choice for stability, 
>>> good performance, and since it is RH for SW compatability.
>> 
>> The only thing I'd comment on that is negative about it is one of its
>> "advantages".  There is a narrow line between stability and stagnation,
>> and you have to figure out which side of that line your cluster will
>> fall on.  Specifically, the fact that Centos/RHEL is frozen for two year
>> intervals has two disadvantages for some people:
>> 
>
> I don't see this as a problem in a production cluster. The fact is that I've 
> been doing this stuff for a little over two decades and I can build anything 
> that I need for an application. For me a manual library build for CentOs 3 is 
> easier than trying to find support for FC4 or reinstalling FC 1x per year. My 
> CentOs 3 nodes have had less than 2hours downtime in 2 years and that was due 
> to a Power Upgrade at their location, that required a complete shutdown of 
> all machines on the floor.

Which is perfectly reasonable, and I agree.  The tradeoff is whether,
and how much, you have to rebuild stuff (and how hard that stuff is to
rebuild).  At one time building e.g. cernlib was a cosmic pain in the
buttocks, for example -- I mean serious pain.  Been there, done that.
It was one of the major appeals of "Scientific Linux" when it first came
out -- cernlib was prebuilt for it, although somehow the libraries it
used were such that you couldn't just rpmbuild --rebuild its source RPM
back on Centos or for that matter Fedora.  But I've built (and tried to
build) it from all the way back when tarballs where what you had to work
with, and where you might expect to make twenty or thirty hacks to get
through the build. Paaaaiiin.

Now it IS in fedora (and may be in Centos for all that I know).  Because
it is in fedora, I'm reasonably sure that rpmbuild --rebuild is all that
is needed to rebuild it under Centos anyway.  But one's choice WAS build
it from scratch, unpackaged, or use fedora, hmmm...

In other words, if the issue is one or two libraries, and they are
decently packaged and have simple dependencies, and if the Centos kernel
WORKS for your hardware, then Centos is an excellent choice.  I was just
pointing out that there were some issues that one SHOULD think about
while making the decision lest one plan to install Centos on brand new
bleeding edge hardware only to learn that the kernel doesn't have
support for its chipset, or that your users need some five-library
constellation that is constantly being updated in Fedora (but which
works decently from snap to snap) but that has to be built, then
rebuilt, then re-rebuilt every three or four months under Centos.

In either case you can make things work with either distro (or any other
linux distro, really) -- they ONLY differ on how much work one has to do
and what kind of work it is that must be done to make them work.  If you
want transparent access to the latest tools and libraries that are
constantly being added to fedora, well, that favors fedora -- to enhance
your cluster instead of building a library or application constellation
you just add a yum install command to your nightly cluster update and
poof, there it is the next day ready to use, auto updating.  If you plan
to update your cluster with brand spiffy new nodes with bleeding edge
motherboards with integrated orange juicers and jolt cola dispensers,
I'd argue that that favors Fedora as well, as chances are much better
that your jolt cola dispenser will "just work" if you kickstart install
the latest fedora on them instead of a two year old Centos, and the
alternative is to not use your new systems until the latest Centos is
released (and pray they work then or the next time they will is two MORE
years later) or start dusting off those kernel building skills and
figure out how to add a whole stack of libraries leading to the
orange-juicer controls to your aging Centos install.

If your cluster really only runs one application (or a small and
straightforward set of applications), and Centos supports your cluster
hardware and that application's library requirements out of the box,
well heck -- of course it is nice to be able to do an installation and
then just forget the cluster for the rest of its operational lifetime.
Done it myself, many times (though not with Centos per se).  If you are
comfortable inserting the requirement on future cluster hardware
purchases "and must be able to boot and run Centos X normally out of the
box" that's fine too.  If your primary applications REQUIRE RHEL/Centos
libraries for binary compatibility, or if you use proprietary software
that PROBABLY would work perfectly with Fedora but where they won't
answer your hotline questions or requests for support unless you are
using RHEL (they usually won't even accept "Centos" as a substitute in
this case, you just have to tell a tiny fib) well, that makes Centos a
really good choice!

How difficult is it to use Fedora instead of Centos in a production
cluster?  What is the cost tradeoff?  Basically it is a day or so of
work, per expected upgrade.  The chores involved are typically:

   a) Mirror a the distribution repo from a suitable site onto your local
install server.  time required -- call it an hour to set up a script,
then five minutes to hack it to point at the current distro per
instance, plus a lag period where you actually do something else and the
download completes.

   b) Clone your PXE/DHCP targets and your kickstart files.  Install
vmlinuz and initrd from the new distro under tftpboot.  Time required:
call it an hour, although it probably won't take that long.

   c) Pick a node -- any node -- to prototype with.  Network boot the
node, into the cloned kickstart file.  Monitor the progress, especially
noting where packages are missing or weirdness occurs.

   d) Look over the package list for the new release, look over what is
missing, resolve conflicts, add stuff that looks like it would be useful
that before you maybe had to build on your own (like cernlib or ganglia
or openmpi).  Test the node with your primary applications (or not).  If
they are "boring" and would run out of the box on Centos, they'll almost
certainly run out of the box post rebuild under Fedora.  If they use
libraries that might have signficantly changed, well, it's a good idea
to test them.  This process can take anywhere from a few minutes to
hours or even days.

In my case it would take minutes because my applications are boring and
I know damn well they'll rebuild under Fedora whatever.  If anything,
they're likely to have trouble BACKporting to run under e.g.  Centos 4
-- from my own personal point of view backporting libraries is a total
pain and a process fraught with peril, especially if they or their
dependencies have significantly evolved in the meantime.  However, there
are definite exceptions -- getting Cisco's vpnclient recently died on my
in a mere UPDATE of the fedora kernel, because they changed a bunch of
stuff in the skbuff stack.  As this example demonstrates, many of those
exceptions involve proprietary software that nobody maintains (really!).
If they maintain it, they have to spend some of the money you pay them
on testing and debugging, after all, and they hate that.  Far easier to
just insist that you freeze your environment in a version that is the
last one for which they grudgingly were FORCED to make it work.

Once you are satisfied, and have modified your kickstart file to match
your new improved package list (which may involve no modification at
all, mind you) and you've tested it successfully -- you basically set a
toggle that causes all your nodes to reboot some dark evening and
PXE-reinstall themselves from the kickstart file.  You COULD even do
this unattended, although it would probably be foolish to.  Either way
the time required to initiate the upgrade is very small (per node) and
if you did your node testing adequately and encounter no further
problems you can read a novel or exercise or something while it occurs.

Add it all up, and you find that doing a node upgrade might take you
anywhere from half a day to a day and a half.  You have to do this work
at least one time roughly 18 months after initially installing the
nodes, and again if you plan to run them after 3 years, although of
course you can ELECT to update more often than that if Fedora has a
fabulous new library added to it that halves the runtime of a lot of
numerical code based on it, or if somebody adds a really spectacular
batch job system to it and replaces the one you're paying for with
something that works better and is easier to use and is free, or if you
just need it to run the hardware you buy a year from now (or else you
have to build a custom kernel, test it, and maintain an update stream
for it by hand forever) and you'd rather run the same OS release on your
entire cluster.  This is what you are trading off against the labor
required to instead keep a pool of libraries and/or the kernel more
aggressively up to date.  To my experience, it doesn't take many library
or auxiliary tool rebuilds, and probably only ONE kernel rebuild, to
compete with the work required to upgrade fedora one time over the
course of a cluster's lifetime.

Either way, you can also choose to run Centos on the servers for your
cluster (a common enough decision, since server hardware tends to be
less aggressively upgraded and since server "stability" is paramount,
although as I said the INstability of Fedora is largely urban myth
anytime sixty days or so post initial release) or you can follow through
and bootstrap your servers to Fedora current every now and then as well.

One LAST issue that should be addressed is what your USERS are running
on their desktops.  This is a nontrivial questions for some cluster
architectures.  If they are running Fedora (so that they get the latest
versions of X, support for their super-duper graphics adaptors and
cameras, the best possible list of printers, so that their laptops have
an even chance of working with their audio and network devices, so that
they can get "flight of the amazon queen" on their desktop) AND if the
cluster architecture is "flat, with NFS shared across the accessing LAN"
-- so that basically the "cluster" is just a pile of headless
workstations on the same LAN and mounting the same disk as the desktops
and/or laptops -- then there is a STRONG incentive to use fedora on the
cluster nodes.  If you don't, users cannot do a build on their desktop
and drop the resulting binaries into the execution queue for the cluster
or otherwise arrange for them to be run in distributed fashion.

It is difficult to assess the cost-benefit tradeoffs here because the
costs are all to the sysadmin and the benefits are all to the user, but
they are substantial, and almost certainly would strongly favor using
the same thing everywhere, probably Fedora.  After all, so it is a pain
in the sysdmin's behind once every 18 months or so to run Fedora, but a)
he/she's doing so anyway to support the desktops, and most of the steps
above are already accomplished before he or she STARTS on the cluster
nodes, and b) the users REALLY REALLY save a lot of time being able to
build, test, debug, and even carry out small prototyping runs on their
own desktops without needing to do all of that on "the cluster" or a
special "build box" with the right library/distro set up, and there are
probably a lot more of them than there are sysadmins.

To re-summarize:  People considering what distribution to use to build a
cluster should think about the following:

    a) What are they familiar with?  You can build a cluster on top of
any distribution, but if you are a Debian expert you're going to find
building a Centos cluster painful and vice versa.  I think most of us on
list would advise "go with your strength" here unless you encounter a
really good reason to do otherwise, at least until you have multiple
strengths or unless you have no strengths at all and have to start from
scratch.  In that and all cases, continue asking yourself:

    b) How scalable is it?  I personally think that scalability and
automation are key elements in the decision, both for clusters and their
closely related client/server LAN installs.  Lots of people like FAI.  I
personally am fond of PXE/kickstart.  Warewulf is pretty easy and
scalable.  Pick something where you do work once, then implement it
across the cluster (or LAN) without having to do more per-node work than
"booting it", possibly making a BIOS level choice as to boot target and
a DHCP choice as to installation image.  This is one of Linux's
STRENGTHS, especially compared to e.g. Windows.  If you have to do more
work than that (per node) you probably need to reexamine your distro
choice or learn new tools associated with the distribution.

    c) How well does it work with my preferred hardware (now and in the
future)?  Again, different distros have very different track records for
staying hardware-current out of the box, for obvious reasons.  If you
want to track bleeding edge hardware, select a bleeding edge distro, not
one that has a 2+ year release cycle and doesn't change much in between
except for bug fixes.  I WISH that this weren't the case -- I wish that
Centos maintained the kernel and device list much more aggressively than
they have in the past -- but they haven't and that's a simple fact.  And
yet, frequently, it doesn't matter (so don't interpret this as "Centos
is bad").

    d) How well does it provision libraries (or non-library packages,
toolsets) I'm likely to need in my cluster's work, and how rapidly do
those libraries vary?  Note that this question can easily work EITHER
WAY -- in some cases one wants NO variation so your proprietary
applications get just the right library, period.  In other cases your
users will be sitting there growling at you and nagging you to build/add
a current version of first this library, then that one as they use
features that just aren't there in the library space of the
long-lifetime distros after their first six months.  Note that some
libraries are rapidly varying and under continuous development with
really important changes occurring with some frequency, while others
have APIs that have varied little over years.  Maybe your cluster
applications need one type, maybe the other.  Maybe (God help you) both.
There is no set answer to this question, and it may even require a least
of possible evils compromise.

    e) What do we use in our LAN for desktops and laptops?  Again, this
is almost a no brainer (and is a question that may not even occur to
people who run an isolated "cluster compute center", rather than a
cluster integrated with and immediately accessible to a departmental or
research LAN).  If all your LAN desktops are running Debian but your
cluster is running two year old Centos, you're forcing them into a very
unnatural and restrictive work model and you're going to be constantly
pressured to backport from one to the other.  You're also significantly
increasing your work load as you try to cope with the differences
between Debian management and Centos management.  An "ideal" situation
to craft is one where there is a smooth path from desktop to cluster,
where a "make" of your source tree on one is pretty much guaranteed to
produce run-ready binaries (including access to all required dynamic
link libraries) for the other, within the boundaries of HARDWARE
architecture, not distribution or binary compatibility model within a
hardware type.  In a lot of clusters it makes sense to completely
flatten NFS space as well to further facilitate this -- project space
that is commonly mounted across cluster and server and desktop so that
even paths on the cluster match paths on your desktop.

Yes, it is perfectly possible to get by without this, and some cluster
architectures (especially "standalone" clusters, regular beowulf type
clusters, computer center type clusters) work just fine without it, but
there are lots of research clusters where this is the way they are set
up, with little or no barrier between the desktop LAN and the cluster
LAN.  In the latter case, the time saved on development and testing and
implementation vastly outweighs the time spent keeping e.g. Fedora up to
date on cluster nodes, in part because you have to keep Fedora up to
date on your desktops anyway, and the only difference between a cluster
node and a desktop is the package selection in a kickstart file or
post-install yum scripts.

    f) Sundry other issues, or "miscellaneous things to look at that I
can't quite pin down here".  For example, if you're building a cluster
out of obsolete and underequipped systems (a thing I not infrequently
advise e.g. high school students on offline) you may find that running
the latest release of ANY distribution out of the box is quite
difficult, but that you can get the linux on a rescue CD to boot quite
nicely, or perhaps a really old version of some existing distribution.
Or perhaps you're playing the "let's install a graphics card and use it
to do computations" game, where the libraries and cross-compilers that
enable it only exist pretty much for one distribution, maybe.  Or your
cluster is part of a grid, or part of a flat LAN or a standalone
beowulf, or has to integrate with a Windows cluster where the best you
can hope for for "compatibility" is cygwin-alikeness (so you pick the
most cygwin-like distro you can).

Every cluster is different, everybody's needs are different, and YMMV.
Which is why none of the EXPERTS here are going to say "you should
ALWAYS use Fedora 6, but only the version that was available three
months ago before they screwed up the kernel and with openmpi built from
fresh source as it isn't up to date in the form our users expect".  The
best choice isn't universal, it is dictated by your own degree of
experience and knowledge, your design goals, your application mix, your
cluster type, your cluster's general environment and support structure,
and even then -- you can get ANY linux distro to WORK for your cluster
even if you get one or more of these "wrong" in the sense that ex post
facto they turn out to be suboptimal.

Live and learn, in other words, and be prepared to experiment and change
as it makes sense to do so.

    rgb

>
> Now I should say, that I don't use diskless nodes, each node has its own OS 
> disk and most have a separate /tmp disk for scratch use. That is one reason 
> that we differ on OS, I believe.
>
>
> Mike
>

-- 
Robert G. Brown
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone(cell): 1-919-280-8443
Web: http://www.phy.duke.edu/~rgb
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977



More information about the Beowulf mailing list