[Beowulf] Which distro for the cluster?

Sun Jan 7 03:22:30 PST 2007

On Wed, Jan 03, 2007 at 09:51:44AM -0500, Robert G. Brown wrote:
> On Wed, 3 Jan 2007, Leif Nixon wrote:
> 
> >"Robert G. Brown" <rgb at phy.duke.edu> writes:
> >
>   b) If an attacker has compromised a user account on one of these
> workstations, IMO the security battle is already largely lost.  They
> have a choice of things to attack or further evil they can try to wreak.
> Attacking the cluster is one of them, and as discussed if the cluster is
> doing real parallel code it is likely to be quite vulnerable regardless
> of whether or not its software is up to date because network security is
> more or less orthogonal to fine-grained code network performance.
> 

Amen, brother :)

> 
> BTW, the cluster's servers were not (and I would not advise that servers
> ever be) running the old distro -- we use a castle keep security model
> where servers have extremely limited access, are the most tightly
> monitored, and are kept aggressively up to date on a fully supported
> distro like Centos.  The idea is to give humans more time to detect
> intruders that have successfully compromised an account at the
> workstation LAN level and squash them like the nasty little dung beetles
> that they are.
> 

Can I quote you for Security 101 when I need to explain this stuff for 
senior management ?

> 
> And we didn't do this "willingly" and aren't that likely to repeat it
> ourselves.  We had some pretty specific reasons to freeze the node
> distro -- the cluster nodes in question were the damnable Tyan dual
> Athlon systems that were an incredible PITA to stabilize in the first
> place (they had multiple firmware bugs and load-based stability issues
> under the best of circumstances).  Once we FINALLY got them set up with
> a functional kernel and library set so that they wouldn't crash, we were
> extremely loathe to mess with it.  So we basically froze it and locked
> down the nodes so they weren't easily accessible except from inside the
> department, and then monitored them with xmlsysd and wulfstat in
> addition to the usual syslog-ng and friends admin tools.
> 

It is _always_ worth browsing the archives of this list. Somebody, 
somewhere has inevitably already seen it/done it/get the scars and is 
able to explain stuff lucidly. I can't recommend this list highly enough 
both for it's high signal/noise ratio and it's smart people [rgb 1-8 
inclusive, for example]
> 
> In general, though, it is very good advice to stay with an updated OS.
> My real point was that WITH yum and a bit of prototyping once every
> 12-24 months, it is really pretty easy to ride the FC wave on MANY
> clusters, where the tradeoff is better support for new hardware and more
> advanced/newer libraries against any library issues that one may or may
> not encounter depending on just what the cluster is doing.  Freezing FC
> (or anything else) long past its support boundary is obviously less
> desireable.  However, it is also often unnecessary.
> 

Fedora Legacy just closed its doors - if you take a couple of months 
to get your Uebercluster up and running, you're 1/3 of the way through 
your FC cycle :( It doesn't square. Fedora looks set to lose its way 
again for Fedora 7 as they merge Fedora Core and Extras and grow to 
n-000 packages again - the fast upgrade cycle, lack of maintainers and 
lack of structure do not bode well. They're apparently moving to a 13 month 
upgrade cycle - so your Fedora odd releases could well be three years apart. 
The answer is to take a stable distribution, install the minimum and work 
with it OR build your own custom infrastructure as far as I can see. 
Neither Red Hat nor Novell are cluster-aware in any detail - they'll 
support their install and base programs but don't have the depth of 
expertise to go further :(

> On clusters that add new hardware, usually bleeding edge, every four to
> six months as research groups hit grant year boundaries and buy their
> next bolus of nodes, FC really does make sense as Centos probably won't
> "work" on those nodes in some important way and you'll be stuck
> backporting kernels or worse on top of your key libraries e.g. the GSL.
> Just upgrade FC regularly across the cluster, probably on an "every
> other release" schedule like the one we use.
> 

Chances are that anything Red Hat Enterprise based just won't work. New 
hardware is always hard. 

> On clusters (or sub-clusters) with a 3 year replacement cycle, Centos or
> other stable equivalent is a no-brainer -- as long as it installs on
> your nodes in the first place (recall my previous comment about the
> "stars needing to be right" to install RHEL/Centos -- the latest release
> has to support the hardware you're buying) you're good to go
> indefinitely, with the warm fuzzy knowledge that your nodes will update
> from a "supported" repo most of their 3+ year lifetime, although for the
> bulk of that time the distro will de-facto be frozen except for whatever
> YOU choose to backport and maintain.
> 

Absolutely.

> 
> Nowadays, with PXE/Kickstart/Yum (or Debian equivalents, or the OS of
> your choice with warewulf, or...) reinstalling OR upgrading a cluster
> node is such a non-event in terms of sysadmin time and effort that it
> can pretty much be done at will.  

I've had the pleasure/pain of watching cluster admins from a distance
as they worked on a fully commercial cluster from major vendors. For 
most on this list, its a no-brainer. I wish I had seen the same.

> The worst thing that such a strategy might require is a rebuild of user
> applications for both distros, but with shared libraries to my own
> admittedly anecdotal experience this "usually" isn't needed going from
> older to newer (that is, an older Centos built binary will "probably"
> still work on a current FC node, although this obviously depends on the
> precise libraries it uses and how rapidly they are changing).  It's a
> bit harder to take binaries from newer to older, especially in packaged
> form.  There you almost certainly need an rpmbuild --rebuild and a bit
> of luck.
> 

I use Debian - I've never had to learn about more than one repository 
and one distribution for everything I need. What is this "rebuilding" of 
which you speak :)

> Truthfully, cluster installation and administration has never been
> simpler.
> 

I think you underestimate your expertise - and the expertise on this 
list. My mantra is that cluster administration should be simple and 
straightforward: in reality, it's seldom so.

>    rgb

Andy