Fedora cluster project? (was Re: [Beowulf] Opteron/Athlon Clustering)

Jeffrey B. Layton laytonjb at charter.net
Wed Jun 9 18:14:51 PDT 2004

Mitchell Skinner wrote:

>On Tue, 2004-06-08 at 15:00, Robert G. Brown wrote:
>><rant> Ah, but you see, this is a religious issue for me for reasons of
>>long-term scalability and maintainability, so I don't even think of this
>>alternative.  Or if you like, I think it costs money in the short run,
>>and costs even more money in the long run, compared to participating in
>>Fedora or CaOSity and doing it once THERE, where everybody can share it.
>>But you knew that...;-)

   Well since RGB is tempting me to post I guess I'll do so :)
I've got a couple of comments about specifics below, but before
that let me make a few general comments.
    I've been using FC1 since 2 days after it came out. It's OK,
but I've had a few problems with it (it locks up about once a
week but I'm not sure that's the fault of FC1). I've found it to
be decent overall. However, I think it's a bit too cutting edge
for production clusters. The target for FC is more toward the
desktop IMHO. That's the wrong target for clusters. At the
ClusterWorld conference the panel on cluster distros was pretty
much in agreement that a cluster distro needs to be lighter
weight than a typical distro (I'll mention a few things below
on that).

>I've been kicking around the idea of starting a fedora-oriented cluster
>***Advantages of Fedora vs:
>RHEL rebuilds (Rocks, cAos)
>*  Don't need to do release engineering for the base distribution (cf.
>Rocks 3.1.0 bug where i686 kernels were installed on athlon machines)

   Since I use cAos quite a bit let me make a few comments about
it. cAos-1 is something like a "proof of concept" in that the developers
wanted to get the build process down before moving on. cAos-2, which
should be out very soon will be much more advanced. cAos is not a
straight RHEL rebuild. I like to think of it as a RHEL rebuild as a
starting point, but with more thought into the other components. Also,
many of the cAos developers are cluster people so everything they do
is focused on clusters to some degree. So it's a well thought out distro
that works great on the desktop and works very well on clusters.

>*  More up to date

Up to date is not always good (I've been there before :)

>*  More stable than unstable, less archaic than stable
>*  freely available update stream
>***My goals for it would be:
>Social/political goals:
>1. Ease of installation ("yum install cluster-master")

cAos and Warewulf-2 can already do this.

>2. piggyback on other work (Fedora release engineering, mobs of people
>trying it out on commodity hardware)
>3. Encourage outside contributions (have a completely open devel
>process, use a license without an advertising clause)
>4. Be an integration point for applications ("yum install mpiblast")

cAos and Warewulf are working on that already (they have a few
packages already finished I think).

>5. Feed back upstream (to fedora and/or directly to maintainers)
>Technical goals/hopes:
>1. Organize as a set of add-on packages, rather than a whole
>distribution (like OSCAR, but without the extra complexity of multiple
>base distributions).  This means creating SRPMs that can be fed upstream
>(unlike rocks-sge, for example).
>2. Use RPM/anaconda to select architecture-specific files, like Rocks
>(handles heterogenous clusters more cleanly than systemimager (OSCAR) or
>network booting (warewulf))

I really don't like anaconda. Have you looked at the code - yuck! The
installed in cAos (cinch) is a much better idea. It's not GUI, but that
is a good thing IMHO. You can even hack it since it's just a bash script!

>***Potential Objections:
>1.  "Fedora changes too frequently" - This is problematic in proportion
>to the pain of change.  One reason that change is painful is that people
>put it off, and then have to make a huge change all at once.  More
>frequent, more incremental changes can work, especially if you have the
>source to your apps.  This is assuming that the still-somewhat-untested
>fedora-legacy project doesn't work out; if it does then this objection
>is moot.  OTOH, if you want your closed-source ISV apps to be certified
>for your setup, then maybe this approach is not for you.

   Change is not a good thing for production clusters. Where I
work we typically don't upgrade the OS for the life of the
cluster (well, maybe one upgrade if we have to). The only reason
we upgrade is either a very noticeable performance advantage,
security issues, IT management insists on an upgrade (usually
for security reasons only), or our applications require it.
   What we really require are updates, typically security updates,
for the life of the clusters. Our corporation has a board that
monitors patches, updates, etc. for our major operating systems
including Linux (although it sounds like "Big Brother" it's kind
of nice to have a dedicated group to monitor patches for us. They
also monitor licensing). Believe me, security and bug patches
for Linux are much easier to track than patches for other Unix
OS's (one OS in particular loved to issue bug patches with loads
of dependencies and then 2 months later issue a patch that rolled
back the previous patch - ugh!).  Also, the patches that we install
are binary only. We're not allowed to build the patches from
source since they are worried about patch skew across systems.
   A big problem for us are the commercial apps. They typically
require that you choose a distro from a small list. This forces us
to pick either something like SuSe or RHEL. We can't pick
something else because it's not supported (and we're required to
have all of our apps supported). However, due to the changes
that RH and others have done, I don't think this is a viable
model for the future. As I've said before, I'd like to see commercial
app providers to support something like a kernel/glibc combo
independent of the distro (as Greg as pointed out, Pathscale has
already gone this route). I'm encouraging all of our commercial
app companies to do this. We'll see if this has any affect or not.



>What do people think?
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list