[Beowulf] Which distro for the cluster?

Chris Samuel csamuel at vpac.org
Thu Dec 28 19:35:20 PST 2006


On Friday 29 December 2006 11:57, Andrew M.A. Cater wrote:

> Pick a distribution that you know that provides the maximum ease of
> maintenance with the maximum number of useful applications already
> packaged / readily available / easily ported. This will depend on your
> problem set: simulating nuclear explosions/weather storm cells/crashing
> cars or are you sequencing genomes/calculating pi/drawing ray traced
> images?

All of the above, and more.. Makes life interesting sometimes.

User 1:  Why do let all these single CPU jobs onto the cluster ?

5 minutes later..

User 2:  Why do you let one person hog 64 CPUs for one job ?

> On Fri, Dec 29, 2006 at 09:39:59AM +1100, Chris Samuel wrote:
> > On Friday 29 December 2006 04:24, Robert G. Brown wrote:
>
> > > I personally would suggest that you go with one of the mainstream,
> > > reasonably well supported, package based distributions.  Centos, FC,
> > > RH, SuSE, Debian/Ubuntu.
> >
> > I'd have to agree there.
>
> Red Hat Enterprise based solutions don't cut it on the application
> front / packaged libraries in my (very limited) experience.

Our (500+) users fall into one of 2 camps usually, these being:

1) They want to run a commercial / 3rd party code, e.g. LS-Dyna, Abaqus, NAMD, 
Schrodinger, etc, that we provide for them.

2) They are compiling code they've obtained from the collaborators, 
colleagues, supervisors, random websites and they need compilers, MPI 
versions and supporting libraries.

There are a couple of people who use toolkits bundled with the OS (R is a good 
example), but just a few.

We avoid RHEL because of their lack of support for useful filesystems.

> The upgrade/maintenance path is not good - it's easier to start from scratch
> and reinstall than to move from one major release to another.

We treat all compute nodes (and, to a lesser degree, head nodes) as 
disposable, they should be able to be rebuilt on a whim from a 
kickstart/autoyast and come up looking exactly the same as the rest.

Our major clusters tend to last about as long as a major distro release (4 
years before they're due for replacement), our users would get a bit upset if 
they found out all their code suddenly stopped working because someone had 
upgraded the version of, say, the systems C++ libraries, etc, under them.

That said..

> Fedora Core - you _have_ to be joking :) Lots of applications - but
> little more than 6 - 12 months of support.

...we do run a tiny Opteron cluster (16 dual CPU nodes) with Fedora quite 
happily, it started off with FC2 and is now running FC5.   It's likely to 
disappear though when the new 64-bit cluster happens next year.

> SuSE is better than RH in some respects, worse in others.

We find it's miles better in terms of filesystem support.  However, they 
blotted their copy book early on by releasing an update of lilo for PPC that 
didn't boot on our Power5 cluster.  Fortunately we tried it out on a single 
test compute node first and they got a fix out.

But Novell's deal with MS hasn't done it any favours.

> OpenSuSE - you may be on your own. SLED 10.2 may have licence costs?

Never tried either of those, they're not supported by IBM's cluster management 
software (CSM).

> Debian (and to a lesser extent Ubuntu) has the largest set of
> pre-packaged "stuff" for specialist maths that I know of and has
> reasonable general purpose tools.

Agreed, but our users tend not to use those.

> If I read the original post correctly, you're talking of an initial 8
> nodes or so and a head node. Prototype it - grab a couple of desktop
> machines from somewhere, a switch and some cat 5. Set up three machines:
> one head and two nodes. Work your way through a toy problem. Do this for
> Warewulf/Rocks/Oscar or whatever - it will give you a feel for something
> of the complexity you'll get and the likely issues you'll face.

Amen!

> > What I'd really like is for a kickstart compatible Debian/Ubuntu (but
> > with mixed 64/32 bit support for AMD64 systems). I know the Ubuntu folks
> > started on this [1], but I don't think they managed to get very far.
>
> dpkg --get-selections >> tempfile ; pxe boot for new node ; scp tempfile
> root at newnode ; ssh newnode; dpkg --set-selections < /root/tempfile ;
> apt-get update ; apt-get dselect-upgrade
>
> goes a long way :)

Is that all the way to completely unattended ? :-)

> > The sad fact of the matter is that often it's the ISV's and cluster
> > management tools that determine what choice of distro you have. :-(
>
> HP and IBM are distro neutral - they'll install / support whatever you
> ask them to (and pay them for).

Sadly that's not the case in our experience, IBM's CSM supports either RHEL or 
SLES (and lags the current updates as they go through a huge testing process 
before releasing it).  This is mainly because CSM isn't just for HPC 
clusters, it also gets used for business HA and OLTP clusters too..

This is why I find Warewulf an interesting concept, and the WareCat 
(Warewulf+xCat) especially so.

We don't have an HP cluster, but I know a man who has and he dreads talking to 
their tech support and having to explain again about why he cannot go to the 
Start Menu and click on a particular icon followed rapidly with why he cannot 
install Windows and call them back. :-(

This is quite sad as Bdale is such an icon and HP use Debian on some of their 
firmware and diagnostic CD's..

All the best,
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061229/0fdde2d5/attachment.sig>


More information about the Beowulf mailing list