[Beowulf] distributions

Gerry Creager N5JXS gerry.creager at tamu.edu
Fri Feb 3 06:51:44 PST 2006

Robert G. Brown wrote:
> On Thu, 2 Feb 2006, Bill Rankin wrote:
>> On Feb 2, 2006, at 8:04 AM, Robert G. Brown wrote:
>>> What we do is use centos for servers (LAN/department servers, that also
>>> serve the cluster nodes with e.g. home and project space).  We use
>>> FC-even revision numbers for desktops, cluster nodes, etc.
>> And for a slightly different view of other clusters at Duke :) - we 
>> use Centos (currently 3.x, migrating to 4.x probably in the summer) on 
>> the large central shared cluster.  We have 500+ nodes right now and 
>> are growing constantly.  We currently support around two dozen 
>> different research groups

We've used SuSE on our original Opteron cluster, but we've loaded Rocks 
on the new Pentium-D nodes (not yet sure I made a good acquisition 
decision there...

There are trade-offs in deciding about any distro.  So far, both of 
these have had niggling little issues but nothing to stop us from using 
one or the other, and migrating to something else.

> You should also point out that those nodes have been (over the same
> period) Intel only, originally i386 only.  We have had opterons since
> FC1, and FC1 (badly) and FC2 were the only game in town to get opteron
> support.  Also back then Centos wasn't really flying smoothly yet, at
> least on campus.

We tried FC1 for the opterons originally.  They had come loaded with 
SuSE 8 and there were issues there.  FC1 didn't work badly for us... it 
didn't work at all.  SuSE 9 and upgrades (infrequent upgrades, stability 
is good) have been pretty good.

Rocks has proven stable and predictable although there are some issues 
we've seen in our scheduling and monitoring that probably reflect our 
familiarity with other distros and systems.

> As I said, we're migrating more to Centos at this point for nodes as
> well, but it is a "no particular hurry" kind of migration as if we don't
> fix roofs when it doesn't rain...
> The main point is, using FCx isn't "crazy", it is just one thing one can
> do, if/when there are reasons to do so (benefits) that outweigh the
> hassles (costs).  Same as any other system decision process, with a
> similarly varied and highly nonlinear landscape that makes
> second-guessing WHY somebody's costs and benefits are what they are (or
> even just perceive them to be) a fruitless endeavor.

I've gotta agree with this assessment.

> In other words "YMMV".  There is a very wide range of distro options
> these days, and none of them are properly "crazy".  Well, maybe some are
> (and no I won't indicate which ones as my asbestos suit is at the
> cleaners:-), but FCx isn't one of them.

We are migrating to Solaris for our NFS servers.  One area we're not 
happy with in CentOS is the NFS service, and we've tuned it pretty 
tightly.  I'm really hoping Solaris can fix some of our performance issues.


>> on the campus with a mixed application pool off everything from 
>> roll-yer-own MPI codes to commercial applications.  Heck, we even 
>> support Matlab to a very limited extent. :-)
>> For us, rolling out a new OS release is a major endeavor.  Lots of 
>> testing has to go on to verify that applications don't break on us and 
>> that all the tool sets that we need are available.  So for us, the 
>> stability of Centos is a big attraction.  Also one consideration for 
>> third party apps is making sure
> The stability, too, has a cost, though.  A number of my apps wouldn't
> compile at all with the GSL version that was standard in Centos 3 --
> missing whole functions I needed.  One then trades off rebuilding things
> like GSL from current source (not a bad idea, but definitely regular
> additional work IF any of your users need it) or using something like
> FCx that updates libraries more regularly and is hence more likely to
> have recently added functionality and performance and bugfix
> improvements.  Either way you're likely to have SOME work to do building
> things on the side especially if you (like Mark H.) like to keep your
> kernel bleeding edge current, or like to pick very specific high
> performance snapshots of e.g. libc and freeze on them.  Ultimately the
> issue is which one is more work for you.
> This issue has in the past been somewhat biased by virtue of the fact
> that we run FCx on desktops, making it easi-er to do both nodes and
> desktops from one distro (had to stabilize either one, right).  But we
> now do Centos and FCx either way, so again the point is moot.
> Ultimately, with PXE/kickstart/yum and things like warewulf, nearly
> ANYTHING like this becomes moot.  The unavoidable work is in setting up
> a cluster node and validating the distro you choose to install, and
> hooking it up to a trustworthy update stream.  At the end of a year,
> that update stream is almost certain to be nearly frozen out anyway,
> especially as far as cluster nodes behind a firewall are concerned.  So
> all that really matters is the compatibility stuff that you mentioned,
> plus how MUCH work it is to validate any given choice and
> build/rebuild/maintain stuff you need that is likely to be "prematurely"
> frozen out in a conservative distro.
>> that they officially "support" the OS platform, in case we have to 
>> deal with their technical support.
>> True, we do run into the cases where a user "must have" the 
>> latest/greatest version of some library they use on their FC desktop, 
>> or found in Debian and we try our best to accommodate them.  But I 
>> figure that we'll have that problem no matter what release of what OS 
>> we run.
> Exactly.
>    rgb
>> -bill
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

More information about the Beowulf mailing list