Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Which distro for the cluster?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Robert G. Brown rgb at phy.duke.edu
Sun Jan 7 12:49:50 PST 2007


On Sun, 7 Jan 2007, Joe Landman wrote:

>>> BTW, the cluster's servers were not (and I would not advise that servers
>>> ever be) running the old distro -- we use a castle keep security model
>>> where servers have extremely limited access, are the most tightly
>>> monitored, and are kept aggressively up to date on a fully supported
>>> distro like Centos.  The idea is to give humans more time to detect
>>> intruders that have successfully compromised an account at the
>>> workstation LAN level and squash them like the nasty little dung beetles
>>> that they are.
>
> Yup.  Even better is never letting the users log in to admin machines.
> Provide machines for them to log into, submit and run jobs from.  Just
> not the admin nodes.

That would be the "servers have extremely limited access" part -- as in
sysadmins only.

> For what I call production cycle shops, those places which have to churn
> out processing 24x7x365, you want as little "upgrading" as possible, and
> it has to be tested/functional with everything.  Ask your favorite CIO
> if they would consider upgrading their most critical systems nightly.
>
> It all boils down to a CBA (as everything does).  Upgrading carries
> risk, no matter who does it, and how carefully things are packaged.  The
> CBA equation should look something like this:
>
> 	value_of_upgrade = positive_benefits_of_upgrade -
> 			   potential_risks_of_upgrade

I completely agree with this.  As I pointed out earlier in the thread,
companies such as banks make "conservative" seem downright radical when
it comes to OS upgrades.  They have to do a complete, thorough,
comprehensive security audit to change ANYTHING on their machines -- as
a requirement in federal law, IIRC.  To get them to take you seriously,
you MUST be prepared to support the OS they install on (once it is
successfully audited) forever -- until the hardware itself falls apart
into itty-bitty bits.

>>> On clusters that add new hardware, usually bleeding edge, every four to
>>> six months as research groups hit grant year boundaries and buy their
>>> next bolus of nodes, FC really does make sense as Centos probably won't
>>> "work" on those nodes in some important way and you'll be stuck
>>> backporting kernels or worse on top of your key libraries e.g. the GSL.
>>> Just upgrade FC regularly across the cluster, probably on an "every
>>> other release" schedule like the one we use.
>>>
>>
>> Chances are that anything Red Hat Enterprise based just won't work. New
>> hardware is always hard.
>
> Heh.  Try to point this out to a purchasing agent on an RFP which
> demands a) newest possible hardware and b) RHEL 4 support.  You get to
> pick one or the other, not both.  Which one do you want?  Hint: "b" is
> far less valuable.
>
> The other (not-so-funny) aspect of this is when we deliver new hardware
> with an OS load that supports the newer hardware and someone wants to
> pull it back to the "corporate standard".  In doing so, they give up
> stability, performance, and often file system support.  Or in the case
> of our JackRabbit unit, when we deliver 30TB of 5U system and we get the
> "ext3 is almost as good as xfs" line.  Uh.... er.... no.   Those who
> really insist upon this must only want 16TB units with no possibility to
> ever grow beyond this (we have a design cooked up to show how to do a 1
> PB in 4 racks as a single file system, or better, an HA 1 PB in 9 racks
> as a single file system).  16TB is great for some folks, but it is a
> fundamental ext3 limit.  You need the untried-in-the-real-world ext4 to
> break that limit.  Or xfs and jfs.

Proving once again that Joe's company provides a valuable service,
because companies like this fill in an important gap between e.g. FC and
a customer's conservative needs.  However, I'll bet Joe is still just as
vulnerable to the other problem -- customer wants to run commercial
package X (which "requires" RHEL) but ALSO wants to run it on bleeding
edge hardware.  I'll bet you really earn your keep on those ones...

   ;-)

      rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list