[Beowulf] Which distro for the cluster?

Robert G. Brown rgb at phy.duke.edu
Sun Jan 7 08:41:57 PST 2007


On Sun, 7 Jan 2007, Andrew M.A. Cater wrote:

>> BTW, the cluster's servers were not (and I would not advise that servers
>> ever be) running the old distro -- we use a castle keep security model
>> where servers have extremely limited access, are the most tightly
>> monitored, and are kept aggressively up to date on a fully supported
>> distro like Centos.  The idea is to give humans more time to detect
>> intruders that have successfully compromised an account at the
>> workstation LAN level and squash them like the nasty little dung beetles
>> that they are.
>>
>
> Can I quote you for Security 101 when I need to explain this stuff for
> senior management ?

Sure.  If you arrange to get me there and pay me an exorbitant fee, I'll
bring along my sucker rod and explain it to them myself on your behalf.

(I wouldn't try to extort the exorbitant fee for it except that to my
experience, if management isn't paying you $150/hour plus expenses for
your expertise, they devalue it.  Besides, I've still got to get
SOMEBODY to pay for my kids' Nintendo W(here )i(s )i(t) -- once I
actually find one to purchase;-)

> It is _always_ worth browsing the archives of this list. Somebody,
> somewhere has inevitably already seen it/done it/get the scars and is
> able to explain stuff lucidly. I can't recommend this list highly enough
> both for it's high signal/noise ratio and it's smart people [rgb 1-8
> inclusive, for example]

Make that $200/hour...;-)

> Fedora Legacy just closed its doors - if you take a couple of months
> to get your Uebercluster up and running, you're 1/3 of the way through
> your FC cycle :( It doesn't square. Fedora looks set to lose its way
> again for Fedora 7 as they merge Fedora Core and Extras and grow to
> n-000 packages again - the fast upgrade cycle, lack of maintainers and
> lack of structure do not bode well. They're apparently moving to a 13 month
> upgrade cycle - so your Fedora odd releases could well be three years apart.
> The answer is to take a stable distribution, install the minimum and work
> with it OR build your own custom infrastructure as far as I can see.
> Neither Red Hat nor Novell are cluster-aware in any detail - they'll
> support their install and base programs but don't have the depth of
> expertise to go further :(

Yeah, well, if RH asked me to be on their board of directors, I could
probably do something about a lot of this.  Their business plan is very
conservative and is "working" in that they are making money and keeping
their investors and the community simultaneously at bay, but they are
also missing multiple opportunities to really solidify and attack
Microsoft head on.  Fedora is working very well in many respects for
them -- I've been very happy with it since roughly FC 2 (FC 1 sucked,
partly because of the emergence of x86_64 and partly for other reasons).

I mean what the heck, they're right down the road, right?  I could
probably drive to board meetings and they could pay me in options...;-)

> Chances are that anything Red Hat Enterprise based just won't work. New
> hardware is always hard.

Yeah, starting right here.  There are several boats RH is missing, but
this is the biggest one.  RH can freeze all sorts of things in any given
distro and just maintain them, but the kernel and its associated
hardware layer of support tools (and applications that directly access
them) are NOT AMONG THEM!  Having a fixed release and support cycles in
terms of months or years is just silly.  Release cycles should be
determined by the way the product itself evolves, which in turn is a
marvelous and somewhat erratic function of the rapidly changing hardware
market and the whims of the toplevel developers (kernel, compiler, main
unix libraries, X).  Application space needs to be DECOUPLED from some
sort of sane base.

This is what they haven't yet grokked -- it is long since past time for
linux in general to separate into two distinct pieces, e.g. Fedora
>>core<< (which should be really minimal but well maintained for a "long
time") and Fedora >>applications<< which should be be entirely separate.
Precisely the same split should be visible in e.g. RHEL -- a core that
is large enough to support commercial applications with aggressive
kernel and hardware-layer updates and number of distinct layers of
applicationware -- X all by itself (separate from the core for RHEL
since servers don't need it and it really isn't desireable there as it's
more to validate, more to secure), DB ware, userspace applications in
general, etc.

With yum, all the work of being able to support partitioned maintenance
on the server or workstation itself is DONE, but the num-nums don't seem
to realize it.  Microsoft would go mad (and go broke) if they tried to
enforce a clean rebuild of every application in the Universe for every
new OS version they release.  And of course they can't -- even though
they've been systematically engulfing makers of WinXX software for years
as rapidly as antitrust laws permit them to do so there are still so
many companies out there that make hardware with device drivers on disk
or standalone software packages that they pretty much have to distribute
a core OS and leave it up to the user to break the hell out of things
with Installshield and battling libraries from ill-built or out of date
software packages.

This is where RH missed the boat entirely.  Faced with a resource
problem as they tried to do the undoable and given a space of possible
solutions, they opted for one of the simplest, but least efficient, of
those solutions.  What they NEEDED to do -- and still need to do -- is
think long and hard about just how to reorganize support of RHE linuces
(and or FC) so it is BOTH efficient enough to remain within their means
and the abilities of their software people to deliver AND capable of
both staying up to date on the kernel/core across the board.  I can then
think of all sorts of ways they could choose to layer successive updates
of application space.  In fact, "Fedora" could refer ONLY to the
aggressiveness of updates in the application layer.

At any rate, I empirically have found Centos to be nearly useless for
roughly 1/2 of each upgrade cycle on whole classes of hardware.  On
laptops it is a joke (except for one 6 month window perhaps right after
it comes out).  On x86_64 hardware it has been a crap shoot.  Even on
i386 hardware, one has the usual problem with this device or that
device, especially in a desktop environment where users DO want their
onboard video or sound or network to work (on server class hardware and
apps it is more likely to work).  Even FC makes me wait on laptops and
some desktop hardware.

THIS is one of two or three places where Lin still suffers relative to
Win -- Windows "always" works on any platform you buy because it is
"always" preinstalled and vendors experience pain and suffering if it
doesn't preinstall in a functional state.  Lin requires me to spend a
quiet hour of moderately expert time googling and reading stuff from
specialized sites to determine which (if any) firewire PCMCIA cards are
known to work before I dare to buy one, which cameras are likely to
work, which video adapters or sound cards are supported, which
motherboard CHIPSETS are known to work.

Bitch, bitch, bitch.  Sigh.

>> Nowadays, with PXE/Kickstart/Yum (or Debian equivalents, or the OS of
>> your choice with warewulf, or...) reinstalling OR upgrading a cluster
>> node is such a non-event in terms of sysadmin time and effort that it
>> can pretty much be done at will.
>
> I've had the pleasure/pain of watching cluster admins from a distance
> as they worked on a fully commercial cluster from major vendors. For
> most on this list, its a no-brainer. I wish I had seen the same.

Rather than say it is a no-brainer, perhaps it is fairer to say that
once one makes a relatively modest investment in training the brain to
learn how to use certain well-supported toolsets and ideas, it becomes
easy and the investment is paid back tenfold.  We're not quite to where
we have a "build-a-bear" GUI front end for cluster building or a
complete "cluster package" in any of the major distros, as far as I
know, although the warewulf folks and maybe the scyld folks and possibly
some others are getting there in their own distinct ways.

Again this is fairly silly.  Installing a cluster in this way and
installing a workstation or office LAN in this way (via PXE/KS/Y) are
really pretty much the same general task -- they differ only in package
selection and possibly -- I say possibly -- in the way workstations or
office systems are named.

Imagine a Red Hat sales rep walking into an office with a laptop (with a
gigE interface, an 8 port gigE switch with cables, and a halfway decent
fast disk).  He sets up the laptop and "borrows" four or five office
desktops and cables them into the switch.  He powers them on and sets
their BIOS to boot from the network first, with a standard 3 second or
whatever timeout.  They boot up, and -- magic! -- they are running
RHEL-whatever, with ooffice etc installed and ready to run.  WinXX is
still untouched on their native disks.  Everything is bulletproof and
automaintaining, with a clear partition between userspace and rootspace,
full control over user accounts and access, etc.

He removes the systems from the switch and puts them back on their
native LAN and reboots them to WinXX, and points out that installing and
maintaining Lin is just that easy.  He could have them set up with a Lin
server that support WinXX clients and Lin clients that boot just that
way overnight, and that permit the office staff to gradually convert to
Lin as they learn that it is mostly virusproof, that ooffice pretty much
just "works" like msoffice, that a browser is a browser and firefox is a
decent one, that there are several hundred free nifty desktop games to
while away those tedious cubicle hours when nobody's looking intead of
three.  At a cost of $50/seat and they can get rid of 2/3 of their admin
staff at the same time because one admin can easily support 100-200
desktop seats...

Hey, I can dream, can't I?

>> The worst thing that such a strategy might require is a rebuild of user
>> applications for both distros, but with shared libraries to my own
>> admittedly anecdotal experience this "usually" isn't needed going from
>> older to newer (that is, an older Centos built binary will "probably"
>> still work on a current FC node, although this obviously depends on the
>> precise libraries it uses and how rapidly they are changing).  It's a
>> bit harder to take binaries from newer to older, especially in packaged
>> form.  There you almost certainly need an rpmbuild --rebuild and a bit
>> of luck.
>>
>
> I use Debian - I've never had to learn about more than one repository
> and one distribution for everything I need. What is this "rebuilding" of
> which you speak :)

Ha.  I remember well the time that we considered Debian in our
department and rejected it because its stable distro suffered from
precisely the same problem that RHEL/Centos suffer from now.  It was
very stable, it worked excellently well, and it was way, way behind the
hardware curve in libraries and kernel support.  It may well be that
they've done a better job than RH at recognizing this as a core user
requirement in pretty much any environment so that the stable release
tracks the kernel and new hardware better (dealing with libraries and
dependencies as required).  It would be pretty easy to do.  It's just
unfortunate that Linux has never QUITE managed to turn the corner and
create clean layers of separation between the hardware and kernel, the
core libraries and compiler, and application space.  Hence the need for
distributions at all per se, hence the need for distribution "releases"
with applications pretty much all rebuilt just for the functional core
in question.

The weird thing is that in principle, both rpm and apt permit one to do
much better.  This is really a problem in computer science and software
design and OS organization that is SOLVABLE.  Packaging schema contain
the hooks required to do so, and the open source community has worked
out truly awesome methodology for maintaining a >>huge<< collection of
packages (I just grabbed images of FC 6 for i386 and x86_64 from Duke's
repo, and they ate close to 30 GB of disk for just the binary RPMs!)
The problem is all in the partitioning -- creating "independently"
maintainable layers.

I have modest hopes for HAL -- it was something of a joke previous to FC
5 or 6, but in 6 it actually works perfectly and transparently a lot of
time.  This is the kind of thing that is necessary -- with enough
abstraction it might be possible to maintain a kernel snapshot
"indefinitely" by simply updating its collection of modules and hal
itself, so that applications "just work" with new hardware without
having to upgrade to an unstable/rawhide release.

>> Truthfully, cluster installation and administration has never been
>> simpler.
>>
>
> I think you underestimate your expertise - and the expertise on this
> list. My mantra is that cluster administration should be simple and
> straightforward: in reality, it's seldom so.

It depends on the paradigm you adopt, and how lucky you are in terms of
hardware matching the capabilities of your distro/release.  Which
perhaps "shouldn't" be a matter of luck, but often is as there is
nothing that can protect you from "lemon" hardware but buying from a
vendor that will if necessary completely replace it.  (Even prototyping
won't always reveal a problem -- it just "probably" will.)

IF you select hardware from a vendor that guarantees hardware
compatibility with any of the current/mainline distros -- and there are
several that do -- AND you select one of those mainline (well-supported
and automagically installable) distros AND you learn to master its
automagic installation techniques, then managing any sort of linux
operation from a single machine to an organization-spanning LAN
consisting of an arbitrary mixture of servers, workstation/office LANs,
and clusters has never been simpler.  That is a true statement.  A
single repo mirror set, a single homemade package repo, and PXE permit a
single individual to provide ALL the software installation and
maintenance support required by a large company under these
circumstances.  Individuals can install linux on their own hardware (at
their own risk) at will from the repo(s), departments that follow the
hardware rules can install and maintain standardized systems any of a
number of ways, and in all cases a pro-class distribution updates all of
these systems in a fully automatic way e.g. nightly to the current repo
update level, making it easy to install new software or update old
software.

Cluster admins have it even easier, as their (linux distro compatible)
nodes are likely to be all IDENTICAL (in groups, at least, over several
generations) and homogeneity is the friend of the administrator just as
heterogeneity is Evil Incarnate.  Give me a switch and cables and a
rackful of Penguin boxes (please!:-), one equipped with a row of
hot-swappable disks and a tape library, and I'll take my laptop and its
currently FC6-full backpack disk and return you a functional cluster in
the amount of time required to physically assemble the nodes plus less
than a day to (re)install them with a perfectly reasonable cluster
configuration, very nearly independent of the number of nodes or racks.
Give me a couple or three days and I can probably arrange to install the
cluster a couple or three different ways -- diskful, diskless, mixed,
scyldified.

Not ALL cluster needs would be satisfied by this of course.  That's the
basic problem described in detail above.  If the cluster "required"
RHEL/Centos release X so it could run commercial package Y (and it
didn't just run anyway on FC6, which it probably would do:-) and the
penguin hardware "required" FC6 because older RHEL/Centos kernels just
don't support the network device or dual core dual CPU AMD x86_64 BIOS,
then yeah, you enter one form of Linux Hell from which there is no easy
escape but to not get the unsupported hardware in the first place, no
matter how much your users beg for bleeding edge hardware, OR getting
your #&!@ software vendor that you are PAYING to REBUILD their damn
application for FC6 (and in the process, package the thing up so it
autobuilds as RPMs or whatever) at least as well as all the maintainers
of the 6000-odd FREE packages in FC6 manage to package them up (grrr) OR
backporting kernels and key libraries from FC6 to RHEL/Centos whatever
-- maybe, possibly, don't hold your breath.  Hell.

Yup, then yum and friends, permitted RPM-derived linuces to emerge from
the long night of software dependency hell (where Debian had long since
stepped into the light).  It is time to really focus on hardware
dependency hell and conditional provisioning trees, both of which are
well within the capabilities of modern packaging systems and the general
linux design.  Conditional provisioning trees, in particular, could
really revolutionize things and perhaps make it possible to get away
from the notion of the "complete distribution release".  The current
paradigm, which worked amazingly well for order of a few hundred
packages, does not scale to a few thousand particularly well, and we're
well on our way to 10 Kpkg and up distribution releases, which will be a
maintenance nightmare under the current scheme.  I think, anyway.

The future should be interesting... as always.  It would be funny, in a
sick sort of way, if Windows manages to hold on in the face of linux
because it supports LESS software (but all of the hardware, nearly
perfectly).  Most people don't need more than a few hundred of the ~10
Kpkgs available.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list