[Beowulf] OS for 64 bit AMD
Joe Landman
landman at scalableinformatics.com
Sun Apr 3 15:12:47 PDT 2005
Mark Hahn wrote:
>>>fully usable in a production environment.
>>I disagree with this, rather strongly. The Fedora series has had a
>>number of surprises for admins, for driver makers, for users, and so
>>forth. SE-Linux, 4k-stacks, glibc changes, etc. All of these wound up
>>in the supported release (e.g. the one for production environments).
>>Sure you can use it on your systems. Of course you can. If something
>>breaks on some commercial code that you might run, are you SOL? If you
>>don't run any commercial code, and have no liability issues associated
>>with using supported platforms, this is a moot point.
>
> you seem to be conflating "changelessness" with productionworthiness
> (or even "stability").
Uh... no. The changes were introduced quite quickly with little
preparation. Given the focus of Fedora, this makes perfect sense. For
a production class system, you do not make changes quickly (generalization).
> if you have a single-purpose cluster dedicated to some specific package,
> then by all means, lock it to whatever release/config/color the
> package's vendor likes the best.
Most of our customers clusters are devoted to 3-6 packages, with some
subset being larger numbers. If your system is used for in-house codes
with no need of guaranteed feature sets (including specific levels of
libraries, supporting packages, etc), by all means, use the distro or
packaging of your choice. If you have a dependency of any sort upon a
package that you do not have source to, you have an effective constraint
on your freedom. Most of our customers are using one or more commercial
codes to which they have no source code.
> but don't pretend that change across releases means that something
> is somehow not production-worthy, or that its defensible for an app
> to depend on the distro, rather than the actual platform (ABI).
<sigh> Apps do depend on distros if you want support from the
commercial vendor, or if you need to defend your results in a legal
forum. The latter is rarely an issue for academic focused machines,
and is very much an issue for industrial research and development folks.
Don't pretend that since it may not apply to you that it doesn't apply
to everyone.
The rate of change of the distro, the focus of the distro, in that it is
a moving target, specifically indicated by the folks who make it, render
it an experimental platform (paraphrase of their words). Experimental
platforms do not a production system make. You can go argue the point
with Redhat if you like, they freely admit that it is an experimental
platform. This system is designed to be the platform where Redhat tests
things (e.g. proving ground). Test systems are not production systems.
>>>only means that FC is on a shorter release cycle, and might contain
>>>the new puce-and-teal color scheme, which turns out to be a bad idea.
>>On the contrary, I don't think SE-Linux is "puce-and-teal color scheme".
>>
>> Nor are 4k stacks (that broke many many drivers). Yes, FC introduced
>
> they were all trivially disable-able. also, what commercial applications
> depend on the size of the kernel stack?
You made the insinuation that the only real release to release changes
were "puce-and-teal color scheme", which I pointed out to be obviously
false. If you did not mean to insinuate it, maybe you can indicate what
what you perceive the maximal impact release to release changes are.
As for code that depends upon the size of the kernel stack, read the
various forums on the drivers. Short version of this is that there were
quite a few broken drivers as a result of this (is a driver not
important in your view? It is to a commercial entity). The ones that
affected me directly were the Linuxant and nVidia drivers. Before you
go off and bang on their non-open source nature, remember that they are
applications people will use, and before you go deploy that nice
workstation sporting the nVidia FX3000 unit for visualization using
ProStar or other engineering codes, you really need the display driver
to work. I have had a few customers that have insisted upon running
FC-x with their nice graphics cards to do their visualization work.
Were they ever surprised. Made lots of frantic calls to us to help them
resolve this.
Here is a simple definition that I think will help frame the discussion
properly. A production class OS should had very few surprises, and
support for the surprises that arise. Is FC-x production class?
>>those. No, it was a significant shock when stuff stopped working. Is
>>that really production ready? (e.g. thorough testing and bug fixes so
>>that there will be no surprises)
>
> all you're saying, again and again, is that "production-worthy" to you
> means that the machine is configured exactly as your single app-vendor
> wants it. with this logic, nothing can ever change. actually, this
> approach is much of the reason that windows sucks so much.
<sigh> Wrong. Production worthy means as I indicated above, though I am
quite sure other reasonable definitions are possible or even more
accepted. Whether you like this or not (and I know I do not like it),
most commercial application vendors qualify their programs on very few
linux distributions. Most folks in the commercial software world have
been burned in the past by "compatibility" and "ABI"s that were supposed
to work. If they are going to be held accountable for the quality (or
lack thereof), they are going to try it. Each additional distribution
adds costs/time (ask Greg, he just indicated as much in his not on
PathScale compiler platform support). Each additional distribution adds
complexity, as LAM 7.0.x may not work with 7.1.x (remember the MPI ABI
discussion? I sure as heck would like this, so I don't need to have 6-7
different MPI implementations on each cluster), or return slightly
different results to their function calls ...
>
>> Bottom line is (apart from Greg's company) I know of very few
>>commercial software vendors targetting FC-x as a supported platform. As
>
> this begs the question of whether commercial apps depend on behavior
> or configuration which is not standard on the platform. in the compiler
> world, for instance, dependence on undefined behavior is a bug.
Commercial app vendors tend to aim for the most widely accepted
platforms, and build for these. So if these platforms have oddities, or
bad libraries/compilers (gcc 2.96), this is going to be carried over
into the application. If they really require some special feature of a
new library (LSTC with LAM, etc), then they will likely build their own
and distribute it. That actually helps, in that if the app has
dependencies that it cannot anticipate the distro has within it, then it
should carry the dependencies forward on their own... though this leads
quickly to 7+ MPI implementations on the cluster.
> FC is not a platform, Linux is. I'd be most curious to hear the explanation
> of how an app gets to be dependent on RHEL and will not work on other
> distributions which conform to the same API. or are you claiming that
> there is no ABI?
<sigh> What has this got to do with FC being production grade? The ABI
for FC has shifted. The ABI for RHEL-x has shifted, though in a defined
manner, and this ABI will remain constant for a 5 year interval after
RHEL-x release. FC-x will shift when needed. These shifts of FC ABI,
the functionality changes, the kernel changes that fundamentally alter
the way drivers work define the purpose of the environment... all these
contribute to the overall view of whether FC is a production ready or
not. If you don't need commercial apps, or better still, your
commercial apps are supported on "Linux" and not on "RHEL", then it
doesn't matter what the OS underlying it is. More to the point, if the
OS does not break drivers with the upgrades, does not break major
functionality at each upgrade, then it is probably a production class
OS. FC-x isnt that. One can easily make the same argument about a
certain OS from the northwest US (I always kick myself after an upgrade,
as they introduce something new that almost, but not quite, works the
same as it did before, and usually manages to break compatibility with
other bits).
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list