[Beowulf] OS for 64 bit AMD
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSun Apr 3 15:12:47 PDT 2005
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: >>>fully usable in a production environment. >>I disagree with this, rather strongly. The Fedora series has had a >>number of surprises for admins, for driver makers, for users, and so >>forth. SE-Linux, 4k-stacks, glibc changes, etc. All of these wound up >>in the supported release (e.g. the one for production environments). >>Sure you can use it on your systems. Of course you can. If something >>breaks on some commercial code that you might run, are you SOL? If you >>don't run any commercial code, and have no liability issues associated >>with using supported platforms, this is a moot point. > > you seem to be conflating "changelessness" with productionworthiness > (or even "stability"). Uh... no. The changes were introduced quite quickly with little preparation. Given the focus of Fedora, this makes perfect sense. For a production class system, you do not make changes quickly (generalization). > if you have a single-purpose cluster dedicated to some specific package, > then by all means, lock it to whatever release/config/color the > package's vendor likes the best. Most of our customers clusters are devoted to 3-6 packages, with some subset being larger numbers. If your system is used for in-house codes with no need of guaranteed feature sets (including specific levels of libraries, supporting packages, etc), by all means, use the distro or packaging of your choice. If you have a dependency of any sort upon a package that you do not have source to, you have an effective constraint on your freedom. Most of our customers are using one or more commercial codes to which they have no source code. > but don't pretend that change across releases means that something > is somehow not production-worthy, or that its defensible for an app > to depend on the distro, rather than the actual platform (ABI). <sigh> Apps do depend on distros if you want support from the commercial vendor, or if you need to defend your results in a legal forum. The latter is rarely an issue for academic focused machines, and is very much an issue for industrial research and development folks. Don't pretend that since it may not apply to you that it doesn't apply to everyone. The rate of change of the distro, the focus of the distro, in that it is a moving target, specifically indicated by the folks who make it, render it an experimental platform (paraphrase of their words). Experimental platforms do not a production system make. You can go argue the point with Redhat if you like, they freely admit that it is an experimental platform. This system is designed to be the platform where Redhat tests things (e.g. proving ground). Test systems are not production systems. >>>only means that FC is on a shorter release cycle, and might contain >>>the new puce-and-teal color scheme, which turns out to be a bad idea. >>On the contrary, I don't think SE-Linux is "puce-and-teal color scheme". >> >> Nor are 4k stacks (that broke many many drivers). Yes, FC introduced > > they were all trivially disable-able. also, what commercial applications > depend on the size of the kernel stack? You made the insinuation that the only real release to release changes were "puce-and-teal color scheme", which I pointed out to be obviously false. If you did not mean to insinuate it, maybe you can indicate what what you perceive the maximal impact release to release changes are. As for code that depends upon the size of the kernel stack, read the various forums on the drivers. Short version of this is that there were quite a few broken drivers as a result of this (is a driver not important in your view? It is to a commercial entity). The ones that affected me directly were the Linuxant and nVidia drivers. Before you go off and bang on their non-open source nature, remember that they are applications people will use, and before you go deploy that nice workstation sporting the nVidia FX3000 unit for visualization using ProStar or other engineering codes, you really need the display driver to work. I have had a few customers that have insisted upon running FC-x with their nice graphics cards to do their visualization work. Were they ever surprised. Made lots of frantic calls to us to help them resolve this. Here is a simple definition that I think will help frame the discussion properly. A production class OS should had very few surprises, and support for the surprises that arise. Is FC-x production class? >>those. No, it was a significant shock when stuff stopped working. Is >>that really production ready? (e.g. thorough testing and bug fixes so >>that there will be no surprises) > > all you're saying, again and again, is that "production-worthy" to you > means that the machine is configured exactly as your single app-vendor > wants it. with this logic, nothing can ever change. actually, this > approach is much of the reason that windows sucks so much. <sigh> Wrong. Production worthy means as I indicated above, though I am quite sure other reasonable definitions are possible or even more accepted. Whether you like this or not (and I know I do not like it), most commercial application vendors qualify their programs on very few linux distributions. Most folks in the commercial software world have been burned in the past by "compatibility" and "ABI"s that were supposed to work. If they are going to be held accountable for the quality (or lack thereof), they are going to try it. Each additional distribution adds costs/time (ask Greg, he just indicated as much in his not on PathScale compiler platform support). Each additional distribution adds complexity, as LAM 7.0.x may not work with 7.1.x (remember the MPI ABI discussion? I sure as heck would like this, so I don't need to have 6-7 different MPI implementations on each cluster), or return slightly different results to their function calls ... > >> Bottom line is (apart from Greg's company) I know of very few >>commercial software vendors targetting FC-x as a supported platform. As > > this begs the question of whether commercial apps depend on behavior > or configuration which is not standard on the platform. in the compiler > world, for instance, dependence on undefined behavior is a bug. Commercial app vendors tend to aim for the most widely accepted platforms, and build for these. So if these platforms have oddities, or bad libraries/compilers (gcc 2.96), this is going to be carried over into the application. If they really require some special feature of a new library (LSTC with LAM, etc), then they will likely build their own and distribute it. That actually helps, in that if the app has dependencies that it cannot anticipate the distro has within it, then it should carry the dependencies forward on their own... though this leads quickly to 7+ MPI implementations on the cluster. > FC is not a platform, Linux is. I'd be most curious to hear the explanation > of how an app gets to be dependent on RHEL and will not work on other > distributions which conform to the same API. or are you claiming that > there is no ABI? <sigh> What has this got to do with FC being production grade? The ABI for FC has shifted. The ABI for RHEL-x has shifted, though in a defined manner, and this ABI will remain constant for a 5 year interval after RHEL-x release. FC-x will shift when needed. These shifts of FC ABI, the functionality changes, the kernel changes that fundamentally alter the way drivers work define the purpose of the environment... all these contribute to the overall view of whether FC is a production ready or not. If you don't need commercial apps, or better still, your commercial apps are supported on "Linux" and not on "RHEL", then it doesn't matter what the OS underlying it is. More to the point, if the OS does not break drivers with the upgrades, does not break major functionality at each upgrade, then it is probably a production class OS. FC-x isnt that. One can easily make the same argument about a certain OS from the northwest US (I always kick myself after an upgrade, as they introduce something new that almost, but not quite, works the same as it did before, and usually manages to break compatibility with other bits). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
