[Beowulf] Which distro for the cluster?
Robert G. Brown
rgb at phy.duke.edu
Thu Dec 28 23:48:04 PST 2006
On Fri, 29 Dec 2006, Andrew M.A. Cater wrote:
> On Fri, Dec 29, 2006 at 09:39:59AM +1100, Chris Samuel wrote:
>> On Friday 29 December 2006 04:24, Robert G. Brown wrote:
>>
>>> I'd be interested in comments to the contrary, but I suspect that Gentoo
>>> is pretty close to the worst possible choice for a cluster base. Maybe
>>> slackware is worse, I don't know.
>>
>> But think of the speed you could emerge applications with a large cluster,
>> distcc and ccache! :-)
>>
>> Then add on the hours of fun trying to track down a problem that's unique to
>> your cluster due to combinations of compiler quirks, library versions, kernel
>> bugs and application odditites..
>>
>
> This is a valid point. If you are not a professional sysadmin / don't
> have one of those around, you don't want to spend time needlessly doing
> hacker/geek/sysadmin type wizardry - you need to get on with your
> research.
Also, how large are those speed advantages? How many of them cannot
already be obtained by simply using a good commercial compiler and
spending some time tuning the application? Very few tools (ATLAS being
a good example) really tune per microarchitecture. The process is not
linear, and it is not easy. Even ATLAS tunes "automatically" more from
a multidimensional gradient search based on certain assumptions -- I
don't think it would be easy to prove that the optimum it reaches is a
global optimum.
> Red Hat Enterprise based solutions don't cut it on the application
> front / packaged libraries in my (very limited) experience. The
> upgrade/maintenance path is not good - it's easier to start from scratch
> and reinstall than to move from one major release to another.
>
> Fedora Core - you _have_ to be joking :) Lots of applications - but
> little more than 6 - 12 months of support.
No, not joking at all. FC is perfectly fine for a cluster, especially
one built with very new hardware (hardware likely to need a very recent
kernel and libraries to work at all) and actually upgrades-by-one tend
to work quite well at this point for systems that haven't been
overgooped with user-level crack or homemade stuff overlaid outside of
the RPM/repo/yum ritual.
Remember, a cluster node is likely to have a really, really boring and
very short package list. We're not talking about major overhauls in X
or gnome or the almost five thousand packages in extras having much
impact -- it is more a matter of the kernel and basic libraries, PVM
and/or MPI and/or a few user's choice packages, maybe some specialty
libraries. I'm guessing four or five very basic package groups and a
dozen individual packages and whatever dependencies they pull in. Or
less. The good thing about FC >>is<< the relatively rapid renewal of at
least some of the libraries -- one could die of old age waiting for the
latest version of the GSL, for example, to get into RHEL/Centos. So one
possible strategy is to develop a very conservative cluster image and
upgrade every other FC release, which is pretty much what Duke does with
FC anyway.
Also, plenty of folks on this list have done just fine running "frozen"
linux distros "as is" for years on cluster nodes. If they aren't broke,
and live behind a firewall so security fixes aren't terribly important,
why fix them? I've got a server upstairs (at home) that is still
running <blush> RH 9. I keep meaning to upgrade it, but I never have
time to set up and safely solve the bootstrapping problem involved, and
it works fine (well inside a firewall and physically secure).
Similarly, I had nodes at Duke that ran RH 7.3 for something like four
years, until they were finally reinstalled with FC 2 or thereabouts.
Why not? 7.3 was stable and just plain "worked" on at least these
nodes; the nodes ran just fine without crashing and supported
near-continuous computation for that entire time. So one could also
easily use FC-whatever by developing and fine tuning a reasonably
bulletproof cluster node configuration for YOUR hardware within its
supported year+, then just freeze it. Or freeze it until there is a
strong REASON to upgrade it -- a miraculously improved libc, a new GSL
that has routines and bugfixes you really need, superyum, bproc as a
standard option, cernlib in extras (the latter a really good reason for
at least SOME people to upgrade to FC6:-).
Honestly, with a kickstart-based cluster, reinstalling a thousand nodes
is a matter of preparing the (new) repo -- usually by rsync'ing one of
the toplevel mirrors -- and debugging the old install on a single node
until satisfied. One then has a choice between a yum upgrade or (I'd
recommend instead) yum-distributing an "upgrade" package that sets up
e.g. grub to do a new, clean, kickstart reinstall, and then triggers
it. You could package the whole thing to go off automagically overnight
and not even be present -- the next day you come in, your nodes are all
upgraded.
I used to include a "node install" in my standard dog and pony show for
people come to visit our cluster -- I'd walk up to an idle node, reboot
it into the PXE kickstart image, and talk about the fact that I was
reinstalling it. We had a fast enough network and tight enough node
image that usually the reinstall would finish about the same time that
my spiel was finished. It was then immediately available for more work.
Upgrades are just that easy. That's scalability.
Warewulf makes it even easier -- build your new image, change a single
pointer on the master/server, reboot the cluster.
I wouldn't advise either running upgrades or freezes of FC for all
cluster environments, but they certainly are reasonable alternatives for
at least some. FC is far from laughable as a cluster distro.
> SuSE is better than RH in some respects, worse in others. OpenSuSE - you
> may be on your own. SLED 10.2 may have licence costs?
Yeah, I dunno about SuSE. I tend to include it in any list because it
is a serious player and (as has been pointed out already in this thread
e.g. deleted below) only the serious players tend to attract
commercial/supported software companies. Still, as long as it and RH
maintain ridiculously high prices (IMHO) for non-commercial environments
I have a hard time pushing either one native anywhere but in a corporate
environment or a non-commercial environment where their line of support
or a piece of software that "only" runs on e.g. RHEL or SuSE is a
critical issue. Banks need super conservatism and can afford to pay for
it. Cluster nodes can afford to be agile and change, or not, as
required by their function and environment, and cluster builders in
academe tend to be poor and highly cost senstive. Most of them don't
need to pay for either one.
> Debian (and to a lesser extent Ubuntu) has the largest set of
> pre-packaged "stuff" for specialist maths that I know of and has
> reasonable general purpose tools.
Not to argue, but Scientific Linux is (like Centos) recompiled RHEL and
also has a large set of these tools including some physics/astronomy
related tools that were, at least, hard to find other places. However,
FC 6 is pretty insane. There are something like 6500 packages total in
the repo list I have selected in yumex on my FC 6 laptop (FC itself,
livna, extras, some Duke stuff, no freshrpms. This number seems to have
increased by around 500 in the last four weeks IIRC -- I'm guessing
people keep adding stuff to extras and maybe livna. At this point FC 6
has e.g. cernlib, ganglia, and much more -- I'm guessing that anything
that is in SL is now in FC 6 extras as SL is too slow/conservative for a
lot of people (as is the RHEL/Centos that is its base).
Debian may well have more stuff, or better stuff for doing numerical
work -- I personally haven't done a detailed package-by-package
comparison and don't know. I do know that only a tiny fraction of all
of the packages available in either one are likely to be relevant to
most cluster builders, and that it is VERY likely that anything that is
missing from either one can easily be packaged and added to your "local"
repo with far less work than what is involved in learning a "new" distro
if you're already used to one.
The bottom line is that I think that most people will find it easiest to
install the linux distro they are most used to and will find that nearly
any of them are adequate to the task, EXCEPT (as noted) non-packaged or
poorly packaged distros -- gentoo and slackware e.g. Scaling is
everything. Scripted installs (ideally FAST scripted installs) and
fully automated maintenance from a common and user-modifiable repo base
are a necessity. There is no question that Debian has this. There is
also no question that most of the RPM-based distros have it as well, and
at this point with yum they are pretty much AS easy to install and
update and upgrade as Debian ever has been. So it ends up being a
religious issue, not a substantive one, except where economics or task
specific functionality kick in (which can necessitate a very specific
distro choice even if it is quite expensive).
>>> I myself favor RH derived, rpm-based,
>>> yum-supported distros that can be installed by PXE/DHCP, kickstart, yum
>>> from a repository server. Installation of such a cluster on diskful
>>> systems proceeds as follows:
>>
>
> If I read the original post correctly, you're talking of an initial 8
> nodes or so and a head node. Prototype it - grab a couple of desktop
> machines from somewhere, a switch and some cat 5. Set up three machines:
> one head and two nodes. Work your way through a toy problem. Do this for
> Warewulf/Rocks/Oscar or whatever - it will give you a feel for something
> of the complexity you'll get and the likely issues you'll face.
Excellent advice. Warewulf in particular will help you learn some of
the solutions that make a cluster scalable even if you opt for some
other paradigm in the end.
A "good" solution in all cases is one where you prototype with a server
and ONE node initially, and can install the other six or seven by at
most network booting them and going off to play with your wii and drink
a beer for a while. Possibly a very short while. If, of course, you
managed to nab a wii (we hypothesized that wii stands for "where is it?"
and not "wireless interactive interface" while shopping before
Christmas...;-). And like beer.
>> What I'd really like is for a kickstart compatible Debian/Ubuntu (but with
>> mixed 64/32 bit support for AMD64 systems). I know the Ubuntu folks started
>> on this [1], but I don't think they managed to get very far.
Yeah, kickstart is lovely. It isn't quite perfect -- I personally wish
it were a two-phase install, with a short "uninterruptible" installation
of the basic package group and maybe X, followed by a yum-based overlay
installation of everything else that is entirely interruptible and
restartable. But then, I <sigh> install over DSL lines from home
sometimes and get irritated if the install fails for any reason before
finishing, which over a full day of installation isn't that unlikely...
Otherwise, though, it is quite decent.
> dpkg --get-selections >> tempfile ; pxe boot for new node ; scp tempfile
> root at newnode ; ssh newnode; dpkg --set-selections < /root/tempfile ;
> apt-get update ; apt-get dselect-upgrade
>
> goes a long way :)
Oooo, that sounds a lot like using yum to do a RPM-based install from a
"naked" list of packages and PXE/diskless root. Something that I'd do
if my life depended on it, for sure, but way short of what kickstart
does and something likely to be a world of fix-me-up-after-the-fact
pain. kickstart manages e.g. network configuration, firewall setup,
language setup, time setup, KVM setup (or not), disk and raid setup (and
properly layered mounting), grup/boot setup, root account setup, more.
The actual installation of packages from a list is the easy part, at
least at this point, given dpkg and/or yum.
Yes, one can (re)invent many wheels to make all this happen -- package
up stuff, rsync stuff, use cfengine (in FC6 extras:-), write bash or
python scripts. Sheer torture. Been there, done that, long ago and
never again.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list