[Beowulf] Which distro for the cluster?

Robert G. Brown rgb at phy.duke.edu
Sun Jan 7 12:49:50 PST 2007


On Sun, 7 Jan 2007, Joe Landman wrote:

>>> BTW, the cluster's servers were not (and I would not advise that servers
>>> ever be) running the old distro -- we use a castle keep security model
>>> where servers have extremely limited access, are the most tightly
>>> monitored, and are kept aggressively up to date on a fully supported
>>> distro like Centos.  The idea is to give humans more time to detect
>>> intruders that have successfully compromised an account at the
>>> workstation LAN level and squash them like the nasty little dung beetles
>>> that they are.
>
> Yup.  Even better is never letting the users log in to admin machines.
> Provide machines for them to log into, submit and run jobs from.  Just
> not the admin nodes.

That would be the "servers have extremely limited access" part -- as in
sysadmins only.

> For what I call production cycle shops, those places which have to churn
> out processing 24x7x365, you want as little "upgrading" as possible, and
> it has to be tested/functional with everything.  Ask your favorite CIO
> if they would consider upgrading their most critical systems nightly.
>
> It all boils down to a CBA (as everything does).  Upgrading carries
> risk, no matter who does it, and how carefully things are packaged.  The
> CBA equation should look something like this:
>
> 	value_of_upgrade = positive_benefits_of_upgrade -
> 			   potential_risks_of_upgrade

I completely agree with this.  As I pointed out earlier in the thread,
companies such as banks make "conservative" seem downright radical when
it comes to OS upgrades.  They have to do a complete, thorough,
comprehensive security audit to change ANYTHING on their machines -- as
a requirement in federal law, IIRC.  To get them to take you seriously,
you MUST be prepared to support the OS they install on (once it is
successfully audited) forever -- until the hardware itself falls apart
into itty-bitty bits.

>>> On clusters that add new hardware, usually bleeding edge, every four to
>>> six months as research groups hit grant year boundaries and buy their
>>> next bolus of nodes, FC really does make sense as Centos probably won't
>>> "work" on those nodes in some important way and you'll be stuck
>>> backporting kernels or worse on top of your key libraries e.g. the GSL.
>>> Just upgrade FC regularly across the cluster, probably on an "every
>>> other release" schedule like the one we use.
>>>
>>
>> Chances are that anything Red Hat Enterprise based just won't work. New
>> hardware is always hard.
>
> Heh.  Try to point this out to a purchasing agent on an RFP which
> demands a) newest possible hardware and b) RHEL 4 support.  You get to
> pick one or the other, not both.  Which one do you want?  Hint: "b" is
> far less valuable.
>
> The other (not-so-funny) aspect of this is when we deliver new hardware
> with an OS load that supports the newer hardware and someone wants to
> pull it back to the "corporate standard".  In doing so, they give up
> stability, performance, and often file system support.  Or in the case
> of our JackRabbit unit, when we deliver 30TB of 5U system and we get the
> "ext3 is almost as good as xfs" line.  Uh.... er.... no.   Those who
> really insist upon this must only want 16TB units with no possibility to
> ever grow beyond this (we have a design cooked up to show how to do a 1
> PB in 4 racks as a single file system, or better, an HA 1 PB in 9 racks
> as a single file system).  16TB is great for some folks, but it is a
> fundamental ext3 limit.  You need the untried-in-the-real-world ext4 to
> break that limit.  Or xfs and jfs.

Proving once again that Joe's company provides a valuable service,
because companies like this fill in an important gap between e.g. FC and
a customer's conservative needs.  However, I'll bet Joe is still just as
vulnerable to the other problem -- customer wants to run commercial
package X (which "requires" RHEL) but ALSO wants to run it on bleeding
edge hardware.  I'll bet you really earn your keep on those ones...

   ;-)

      rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list