[Beowulf] OS for 64 bit AMD

Bob Drzyzgula bob at drzyzgula.org
Sun Apr 3 20:34:47 PDT 2005

I've been following this discussion, and I just wanted to
throw in my $0.02 on a couple of points:

 * I think that it's possibly a bit disingenuous to
   focus on the rapid cycling of FC-x releases. Perhaps
   someone will correct me, but I wasn't aware that the
   interfaces present in e.g.  FC-2 have ever changed
   all that much. AFAIK the big changes happen at release
   boundaries, e.g. FC-2 to FC-3, like SElinux changing from
   default off to default on, and even then an upgraded
   machine will be less dramatically changed than one that
   had its drive wiped before installing the new OS.

   What is much more important in a true "production"
   environment is the length of time one can expect to
   obtain patches for the OS. No "production shop" that
   really is running a "production application" is likely
   to be replacing the OS on anything like the kind of
   schedule that FC-x -- or even RHEL -- releases come
   out. They are much more likely to qualify all their
   applications on a specific OS release, move this new
   image -- OS + applications -- into production, and run
   it until there is some compelling reason to change,
   and this compelling reason can be several years in
   coming. Even OS patches would only be applied in
   limited circumstances. These would be (a) to remedy a
   locally-observed failure mode, (b) to support required
   application updates, or (c) to address specific security
   issues. In all cases except in the most severe security
   problems, such patches would be applied after extensive
   testing to verify that production activities would not
   be affected.

   Now, in principal there is no real reason why --
   vendor support notwithstanding -- a production shop
   could not be set up to run on e.g. FC-3. However, the
   disappearance of the official patch stream after a few
   months would, or at least should, give one pause. Of
   course there is Fedora Legacy, and one can always
   patch the RPMs one's self. But it all starts to get
   pretty tenuous and labor-intensive after a while. By
   contrast, Red Hat is promising update support for RHEL
   version for at least five years after release. *This*,
   not the release cycle, is why production shops -- and
   their application vendors -- will prefer RHEL over
   FC-x. It really doesn't (or shouldn't) make a damn
   bit of difference to a production shop how the OS is
   characterized: "beta", "proving ground", "enterprise",
   whatever. What really matters is the promises that are
   made with respect to out-year support.

   That being said, the product's longevity is a bit of
   a double-edged sword. To the extent that any part of
   a system's user base needs to move on -- to develop
   and/or implement new applications -- the age of the OS
   you are running can come back to hurt you. The latest
   versions of your software or hardware may simply not
   work with your rickety old OS. But this falls into the
   category of "compelling reasons to change" as I said
   above. Change control is a perpetual balancing act,
   but that just makes the long update life that much more
   important -- the last thing any production shop needs
   is another reason to have to change.

   This, of course, is how one finds out that one is not
   really a "production shop", after all -- when the demand
   for the latest and greatest is constantly trumping the
   "production" applications' inertial pull in the other
   direction. RHEL can suck pretty bad in a research
   environment, where you are likely to wind up with half
   of the RH-supplied packages supplemented with your own
   builds of more recent stuff piling up in /usr/local.

 * I get a bit frustrated at the hostility toward
   commercial applications and closed hardware, especially
   to the extent that it gets directed toward the customers
   of those products. If there existed an open replacement
   for SAS, for example, I can say without hesitation that
   we would be using it. Hell, if there was a *commercial*
   replacement for SAS, we'd probably be using it. There
   simply isn't -- there isn't even anything close. Same
   thing with Matlab [1] or Gauss. Yes, there is Scilab
   and Octave, but those only implement the bulk of the
   core functionality of Matlab.  The Matlab toolboxes
   are unique even in the world of commercial software.

   If you have a choice between solving a problem today or
   spending months writing the tool to solve the problem,
   the decision will most likely be based on (a) how much
   it will cost to develop the tool plus the cost of not
   having the solution for months (which is likely to have,
   absent extensive analysis, non-monetary units), and
   (b) what it would cost to have the tool today.

   For many commercial tools, each side of this question
   will be represented by large classes of problems
   and circumstances. To the extent that organizations
   commonly find that it would be both tolerable and more
   cost-effective to wait for a locally-developed tool to
   solve a particular problem, we are much more likely to
   have an open-source tool available (Apache, anyone?) to
   solve that sort of problem. But in the case of SAS,
   for example, it appears that the people who find it
   practical to build a replacement tool either don't find
   it effective to release it as open source, don't find
   it practical to build in generally-applicable form,
   or simply don't exist.

   The only other approach is, of course, to find a
   different problem to solve, one that can be solved with
   existing, free tools. I suspect that this often happens
   in academia, but it is rarely practical in business
   or government.

   The same goes for closed hardware. I don't much
   care about high-end graphics cards, but storage
   is a big issue.  I've recently been looking for new
   storage for a sizable network, and am finding that the
   option of affordable external, high-speed (FC class)
   RAID controllers serving up generic, high-speed,
   high-reliability (e.g. not SATA) disk, has pretty much
   vanished from the market over the past year or so. As
   has been mentioned, everyone wants you to use their
   JBODs, their disk modules, and in some cases their
   HBAs and closed-source drivers. And they want you to
   pay dearly for it. I hardly find this acceptable, but
   I honestly don't know what else to do except to decide
   that capacity, throughput, reliability, availability and
   manageability just aren't that important after all.

--Bob Drzyzgula

[1] Matlab is actually a poor example for this discussion
in that, to their credit, Mathworks in fact only
requires, beyond a 2.4 or 2.6 kernel, a specific glibc
version. 2.3.2.

More information about the Beowulf mailing list