[Beowulf] OS for 64 bit AMD
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSun Apr 3 17:55:06 PDT 2005
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: > this is utterly pointless, since we seem to disagree on axioms: Not really. We disagree on basic definitions. Axioms are accepted "truths". > correct code conforms to the standard; it is buggy if it depends > on undefined (outside-the-standard) behavior. We agree on this. > the platform is the ABI, not the distribution. if you believe that > the ABI doesn't cover enough, talk to the organization that manages it. We disagree on this. This is not an axiom. RH9 is the prototypical case which changed the ABI in an incompatible manner with an existing functional ABI. At this point, the platform became the distribution, as commercial vendors target platforms (specifically RH) with the largest installed base. If the linux platform were truly distribution independent, then it would not matter what it was compiled for, and frankly vendors would not need to QA against multiple distributions, as they would have the ABI. Unfortunately this is not how it works. I would like it to work like this. It would be great if the LSB would in fact require certification, and that the application vendors would be required to code to certification levels (been arguing this for years). Not likely to happen, but it would be nice. > productionworthiness (PW) is behavioral stability, not some vendor's > assertion about "support". It is *long term* behavioural, driver, and interface stability. Changing an ABI midway through (4k stacks) is *not* behavioral stability. You have no real reason to expect a code to work correctly when you alter one of the critical underlying structures that it relies upon. Many drivers rested on 8k kernel stacks, it was in the ABI as a (defacto) standard. RHEL3 did not (properly so) change its underlying kernel structures in such a way to render some portions of the system unworkable. RHEL4 is not likely to change its underlying kernel structures in such a way to render some portions of the system unworkable. FC-x is likely to (and has) changed its underlying kernel structures. > there is no data to suggest that a "supported" configuration > is actually more stable - support is a matter of CYA and risk aversion. > (not the actual risk; PW is the actual risk (well, inverse of it).) May in fact be less stable, though the likelihood is that it is more conservative (makes support easier) so the implication is more stable. The fact is that the supported configurations are fundamentally averse to changing the underlying internals. This is not the case in FC-x (nor should it be given its purpose). > Fedora has normal release management, with pre-release testing > as well as post-release updates. the pre-release testing is > also known as "beta-testing". So you have pre-release testing as "beta-testing" but you deny that "proving ground" is beta-testing? Seems to be same side of a coin here. Having a normal release management does not a production quality system make. It is most definitely one of the requirements for such a system, but it does not, in and of itself, make the OS a production class OS. A reasonable definition of production class OS will likely incorporate inherent stability of the underlying structures of the system, and a guarantee that they will not change for some fixed interval. Production specifically implies a repetitive behavior, specifically for HPC, a cycle shop. If the next incompatible change in FC-x renders your IB drivers unworkable for your cluster, does that in fact make the OS that you have installed on the system production ready or not? If you have to continuously chase hacks/patches/etc to keep your system operational after every upgrade, does that make your system production ready? > > -- > > the existence of commercial products which specify RH-whatever vX.Y > does not magically turn FC into a beta-test. if you redefine words > that way, you might as well call all of SunOS a beta for Solaris. Er... you are the only one who indicated this, so if you want to argue this, I would suggest you contact the person who generated this idea (that commercial products dependent upon RH make FC a beta test) who can be found at hahn _at_ physics _dot_ mcmaster _dot_ ca. I said "My customers care about running on distributions (whoops, there we go with that word again) on which their apps are supported. I am not aware of active support for FC-x for applications from commercial program providers. If I am incorrect about this, please let me know (seriously, as FC-3++ looks to be pretty good)." Prior to this I said "It is by Redhat's definition, a rolling beta (proving ground)." The two are specifically independent ideas. I know of few commercially supported applications that will accept support calls from FC-x running users. Note: Debian has very little in the way of commercial support (none from the distributer). It is most definitely not a beta. You can use the beta version in unstable. This is analogous to Fedora. What makes FC a beta is that Redhat specifically is note that, and is using Fedora as a "proving ground" (c.f. http://dictionary.reference.com/search?q=proving+ground ) as in "It is also a proving ground for new technology that may eventually make its way into Red Hat products." (from http://fedora.redhat.com/ ) From the reference.com site "prov·ing ground (prvng) n. A place for testing new devices, weapons, or theories." Would you call a system that is defined by its maker to be a proving ground to be a production environment (e.g. stable, unchanging) ? > the customer needs to evaluate how fragile a commercial product is: > how well it conforms to the ABI. NVidia is a great example of > an attractive product which is inherently fragile since NVidia > chooses to hide trade secrets in a binary-only, kernel-mode driver > which (by definition and example) depends on undefined behavior. > VMWare is another good (flawed) example. Hmmm. I hear this argument time and again from people about the closed source nature of nVidia's drivers. nVidia does not (as far as I know) own all the intellectual property in their driver, and they do not have the right to give that IP away via GPL or any other mechanism. The fundamental flaw in the arguments against the nVidia driver are an inherent presumtion that nVidia is hiding trade secrets in order to make its life better and get end user lock-in. The behavior it (the driver) depends upon has been built into the kernel, and when that behavior suddenly changed, nVidia wasnt the only driver affected. Many open source drivers were impacted. Are you going to argue that this makes them (the open source drivers) inherently fragile? This is a natural extension and simple application of your argument. This is a weak argument at best, and some of its fundamental premises are fatally flawed. If nVidia owned all the IP in everything they released, and chose simply to release binary only drivers, that would be a completely different case. Unfortunately, a fair amount of the IP in OpenGL and other related standards is owned by companies that have no interest in open source other than demolishing it. SGI sold off most of its IP in OpenGL to some other outfit. > "supported configuration" is nothing more or less than a way to > "download" support costs to the platform vendor (PV). it's a lever, > acting on the customer as a pivot, to force the PV to avoid changes > of any sort, since its impossible to tell what internals the proprietary > product depends on. Uh.... I think we disagree again. A supported configuration is something that a customer, an end user, a developer should have a reasonable and fighting chance of having it work right. This means that the internals that are exposed to developers will no change (including driver developers). This means that end users and customers have a reasonable expectation that their configuration on the supported list should work, and the onus is on the platform vendor (nice to see you switched to the definition of platform that I was using BTW) to make it work without breaking other stuff. > drastistically > similarly, SOP in the Fibrechannel world is to provide only negative > definitions of support (nothing but HP disks in HP SANs.) this can be > seen as a flaw in standard-defining, since Ethernet provides a fairly > decent counterexample where interoperability is the norm because > products need to conform, not "qualify". A standard is only useful if people pay attention to it, and engineer/design/build to it. Standards are very useful to developers, in that if they code in a particular manner that adheres to the standard, they have a fighting chance of developing something that will work. If the standard suddenly changes on them, and their stuff breaks, who do they turn to? If the target is moving, how much time/effort will they expend to chase it? In some cases (development tools) it makes sense to chase some specific moving targets (though it costs time/effort and therefore real money). In other cases it makes sense to wait for stable releases where things will not change, so your customers/end users can get your stuff and make it work, because you have a fighting chance at making it work. Greg's company (and the folks at the Portland Group) have to chase these targets... many of their customers are there (I'd bet that a small fraction of their collective total customer base are using the development tools to generate commercial code, most are using the tools for their research/development tasks). Yeah, there are significant interoperability problems in things like SAN and what-not-else. These are unfortunate. This is part of the reason why I try to avoid such things (I don't like vendors locking me in, and I know my customers don't like being locked in, so I don't waste my companies time trying to figure out how to do this). Don't assume that a companies / end users misapplication of a standard, hijacking of a standard, or abuse of a standard somehow makes all standards bad. They are not. Standards are sometimes the only lever you have in a commercial closed source context... demanding that a company adhere to what it claims to sell is sometimes a necessary path. Interoperability means that when people interpret the standards, that all parties agree on the definitions, and that they guarantee that their products will in fact conform to the standard, and that there will be tests of the standard compliance, and out of compliant systems will be adjusted to be in-compliance, and that interoperability with other standards will be guaranteed. This is why IDE, SCSI, and Ethernet work so well. This is why some others do not. IB is likely to work quite well going forward. This is why the SAMBA folks are chasing a moving target, as the CIFS "standard" is a moving one (just go ahead and update that XP with a SAMBA server around .... grrrrr). I like and use FC-x, we run FC-2 and FC-3 on various machines (AMD64, my laptop as part of a triple boot, and x86). I make sure our software runs on this, we compile and test on FC as well as for others (RH/Centos, SuSE, looking at Ubuntu/Debian) . I am happy that our binary packages seem to work nicely across multiple distributions (though we usually bring the source along to be sure), and our large systems are built from source, so they should work (as long as the underlying technology works). Our software works at a high level, and depends upon lower level bits. I don't see the effect of the OS changes as much as the tool/hardware vendors do, though every now and then something breaks a driver. But, and this is the critical point for us, if our software breaks at our customers site, we own the fixes, it is our job to make them happen. More importantly, if something breaks in the chain of software (whether we own it or not), we try to help, as it is critical to make sure that failure modes are understood, and problems are resolved. We have been and will be helping our customers resolve problems with third party software, commercial and otherwise. If our target platform were moving, so that the C compiler structures were changing, and we had to rebuild time and time again with each OS update, I would wait until we saw this settle out. Otherwise we are spinning our wheels, as each change is more work, and in the end, it should converge to a final state. It is the final state that is worth targetting (for us, for others such as PathScale, they have to follow what their customers use). The issue in FC-x is that it is open to internals changing. I think this is a good thing. It is doing what it was intended to do, and I like seeing the directions I need to worry about going forward. I will not likely deploy this as an OS for a cluster customer without the customer understanding exactly what they are getting, and making sure they understand what is needed to support this. If they really want a cheap RH, they can get Centos/Tao. If they want internal structural stability, and support from commercial vendors for their commercial codes, they will have to run something that the commercial vendors will support. PathScale and possibly the Portland group (and I am going to guess Etnus and a few others) do or will likely support it. LSTC, MSC, Accelrys, Tripos, Oracle, ... will likely not (though it will probably run fine with no issues). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
