[Beowulf] Which distro for the cluster?
Robert G. Brown
rgb at phy.duke.edu
Sun Jan 7 12:49:50 PST 2007
On Sun, 7 Jan 2007, Joe Landman wrote:
>>> BTW, the cluster's servers were not (and I would not advise that servers
>>> ever be) running the old distro -- we use a castle keep security model
>>> where servers have extremely limited access, are the most tightly
>>> monitored, and are kept aggressively up to date on a fully supported
>>> distro like Centos. The idea is to give humans more time to detect
>>> intruders that have successfully compromised an account at the
>>> workstation LAN level and squash them like the nasty little dung beetles
>>> that they are.
> Yup. Even better is never letting the users log in to admin machines.
> Provide machines for them to log into, submit and run jobs from. Just
> not the admin nodes.
That would be the "servers have extremely limited access" part -- as in
> For what I call production cycle shops, those places which have to churn
> out processing 24x7x365, you want as little "upgrading" as possible, and
> it has to be tested/functional with everything. Ask your favorite CIO
> if they would consider upgrading their most critical systems nightly.
> It all boils down to a CBA (as everything does). Upgrading carries
> risk, no matter who does it, and how carefully things are packaged. The
> CBA equation should look something like this:
> value_of_upgrade = positive_benefits_of_upgrade -
I completely agree with this. As I pointed out earlier in the thread,
companies such as banks make "conservative" seem downright radical when
it comes to OS upgrades. They have to do a complete, thorough,
comprehensive security audit to change ANYTHING on their machines -- as
a requirement in federal law, IIRC. To get them to take you seriously,
you MUST be prepared to support the OS they install on (once it is
successfully audited) forever -- until the hardware itself falls apart
into itty-bitty bits.
>>> On clusters that add new hardware, usually bleeding edge, every four to
>>> six months as research groups hit grant year boundaries and buy their
>>> next bolus of nodes, FC really does make sense as Centos probably won't
>>> "work" on those nodes in some important way and you'll be stuck
>>> backporting kernels or worse on top of your key libraries e.g. the GSL.
>>> Just upgrade FC regularly across the cluster, probably on an "every
>>> other release" schedule like the one we use.
>> Chances are that anything Red Hat Enterprise based just won't work. New
>> hardware is always hard.
> Heh. Try to point this out to a purchasing agent on an RFP which
> demands a) newest possible hardware and b) RHEL 4 support. You get to
> pick one or the other, not both. Which one do you want? Hint: "b" is
> far less valuable.
> The other (not-so-funny) aspect of this is when we deliver new hardware
> with an OS load that supports the newer hardware and someone wants to
> pull it back to the "corporate standard". In doing so, they give up
> stability, performance, and often file system support. Or in the case
> of our JackRabbit unit, when we deliver 30TB of 5U system and we get the
> "ext3 is almost as good as xfs" line. Uh.... er.... no. Those who
> really insist upon this must only want 16TB units with no possibility to
> ever grow beyond this (we have a design cooked up to show how to do a 1
> PB in 4 racks as a single file system, or better, an HA 1 PB in 9 racks
> as a single file system). 16TB is great for some folks, but it is a
> fundamental ext3 limit. You need the untried-in-the-real-world ext4 to
> break that limit. Or xfs and jfs.
Proving once again that Joe's company provides a valuable service,
because companies like this fill in an important gap between e.g. FC and
a customer's conservative needs. However, I'll bet Joe is still just as
vulnerable to the other problem -- customer wants to run commercial
package X (which "requires" RHEL) but ALSO wants to run it on bleeding
edge hardware. I'll bet you really earn your keep on those ones...
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf