[Beowulf] Which distro for the cluster?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.Robert G. Brown rgb at phy.duke.edu
Sun Jan 7 08:41:57 PST 2007
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 7 Jan 2007, Andrew M.A. Cater wrote: >> BTW, the cluster's servers were not (and I would not advise that servers >> ever be) running the old distro -- we use a castle keep security model >> where servers have extremely limited access, are the most tightly >> monitored, and are kept aggressively up to date on a fully supported >> distro like Centos. The idea is to give humans more time to detect >> intruders that have successfully compromised an account at the >> workstation LAN level and squash them like the nasty little dung beetles >> that they are. >> > > Can I quote you for Security 101 when I need to explain this stuff for > senior management ? Sure. If you arrange to get me there and pay me an exorbitant fee, I'll bring along my sucker rod and explain it to them myself on your behalf. (I wouldn't try to extort the exorbitant fee for it except that to my experience, if management isn't paying you $150/hour plus expenses for your expertise, they devalue it. Besides, I've still got to get SOMEBODY to pay for my kids' Nintendo W(here )i(s )i(t) -- once I actually find one to purchase;-) > It is _always_ worth browsing the archives of this list. Somebody, > somewhere has inevitably already seen it/done it/get the scars and is > able to explain stuff lucidly. I can't recommend this list highly enough > both for it's high signal/noise ratio and it's smart people [rgb 1-8 > inclusive, for example] Make that $200/hour...;-) > Fedora Legacy just closed its doors - if you take a couple of months > to get your Uebercluster up and running, you're 1/3 of the way through > your FC cycle :( It doesn't square. Fedora looks set to lose its way > again for Fedora 7 as they merge Fedora Core and Extras and grow to > n-000 packages again - the fast upgrade cycle, lack of maintainers and > lack of structure do not bode well. They're apparently moving to a 13 month > upgrade cycle - so your Fedora odd releases could well be three years apart. > The answer is to take a stable distribution, install the minimum and work > with it OR build your own custom infrastructure as far as I can see. > Neither Red Hat nor Novell are cluster-aware in any detail - they'll > support their install and base programs but don't have the depth of > expertise to go further :( Yeah, well, if RH asked me to be on their board of directors, I could probably do something about a lot of this. Their business plan is very conservative and is "working" in that they are making money and keeping their investors and the community simultaneously at bay, but they are also missing multiple opportunities to really solidify and attack Microsoft head on. Fedora is working very well in many respects for them -- I've been very happy with it since roughly FC 2 (FC 1 sucked, partly because of the emergence of x86_64 and partly for other reasons). I mean what the heck, they're right down the road, right? I could probably drive to board meetings and they could pay me in options...;-) > Chances are that anything Red Hat Enterprise based just won't work. New > hardware is always hard. Yeah, starting right here. There are several boats RH is missing, but this is the biggest one. RH can freeze all sorts of things in any given distro and just maintain them, but the kernel and its associated hardware layer of support tools (and applications that directly access them) are NOT AMONG THEM! Having a fixed release and support cycles in terms of months or years is just silly. Release cycles should be determined by the way the product itself evolves, which in turn is a marvelous and somewhat erratic function of the rapidly changing hardware market and the whims of the toplevel developers (kernel, compiler, main unix libraries, X). Application space needs to be DECOUPLED from some sort of sane base. This is what they haven't yet grokked -- it is long since past time for linux in general to separate into two distinct pieces, e.g. Fedora >>core<< (which should be really minimal but well maintained for a "long time") and Fedora >>applications<< which should be be entirely separate. Precisely the same split should be visible in e.g. RHEL -- a core that is large enough to support commercial applications with aggressive kernel and hardware-layer updates and number of distinct layers of applicationware -- X all by itself (separate from the core for RHEL since servers don't need it and it really isn't desireable there as it's more to validate, more to secure), DB ware, userspace applications in general, etc. With yum, all the work of being able to support partitioned maintenance on the server or workstation itself is DONE, but the num-nums don't seem to realize it. Microsoft would go mad (and go broke) if they tried to enforce a clean rebuild of every application in the Universe for every new OS version they release. And of course they can't -- even though they've been systematically engulfing makers of WinXX software for years as rapidly as antitrust laws permit them to do so there are still so many companies out there that make hardware with device drivers on disk or standalone software packages that they pretty much have to distribute a core OS and leave it up to the user to break the hell out of things with Installshield and battling libraries from ill-built or out of date software packages. This is where RH missed the boat entirely. Faced with a resource problem as they tried to do the undoable and given a space of possible solutions, they opted for one of the simplest, but least efficient, of those solutions. What they NEEDED to do -- and still need to do -- is think long and hard about just how to reorganize support of RHE linuces (and or FC) so it is BOTH efficient enough to remain within their means and the abilities of their software people to deliver AND capable of both staying up to date on the kernel/core across the board. I can then think of all sorts of ways they could choose to layer successive updates of application space. In fact, "Fedora" could refer ONLY to the aggressiveness of updates in the application layer. At any rate, I empirically have found Centos to be nearly useless for roughly 1/2 of each upgrade cycle on whole classes of hardware. On laptops it is a joke (except for one 6 month window perhaps right after it comes out). On x86_64 hardware it has been a crap shoot. Even on i386 hardware, one has the usual problem with this device or that device, especially in a desktop environment where users DO want their onboard video or sound or network to work (on server class hardware and apps it is more likely to work). Even FC makes me wait on laptops and some desktop hardware. THIS is one of two or three places where Lin still suffers relative to Win -- Windows "always" works on any platform you buy because it is "always" preinstalled and vendors experience pain and suffering if it doesn't preinstall in a functional state. Lin requires me to spend a quiet hour of moderately expert time googling and reading stuff from specialized sites to determine which (if any) firewire PCMCIA cards are known to work before I dare to buy one, which cameras are likely to work, which video adapters or sound cards are supported, which motherboard CHIPSETS are known to work. Bitch, bitch, bitch. Sigh. >> Nowadays, with PXE/Kickstart/Yum (or Debian equivalents, or the OS of >> your choice with warewulf, or...) reinstalling OR upgrading a cluster >> node is such a non-event in terms of sysadmin time and effort that it >> can pretty much be done at will. > > I've had the pleasure/pain of watching cluster admins from a distance > as they worked on a fully commercial cluster from major vendors. For > most on this list, its a no-brainer. I wish I had seen the same. Rather than say it is a no-brainer, perhaps it is fairer to say that once one makes a relatively modest investment in training the brain to learn how to use certain well-supported toolsets and ideas, it becomes easy and the investment is paid back tenfold. We're not quite to where we have a "build-a-bear" GUI front end for cluster building or a complete "cluster package" in any of the major distros, as far as I know, although the warewulf folks and maybe the scyld folks and possibly some others are getting there in their own distinct ways. Again this is fairly silly. Installing a cluster in this way and installing a workstation or office LAN in this way (via PXE/KS/Y) are really pretty much the same general task -- they differ only in package selection and possibly -- I say possibly -- in the way workstations or office systems are named. Imagine a Red Hat sales rep walking into an office with a laptop (with a gigE interface, an 8 port gigE switch with cables, and a halfway decent fast disk). He sets up the laptop and "borrows" four or five office desktops and cables them into the switch. He powers them on and sets their BIOS to boot from the network first, with a standard 3 second or whatever timeout. They boot up, and -- magic! -- they are running RHEL-whatever, with ooffice etc installed and ready to run. WinXX is still untouched on their native disks. Everything is bulletproof and automaintaining, with a clear partition between userspace and rootspace, full control over user accounts and access, etc. He removes the systems from the switch and puts them back on their native LAN and reboots them to WinXX, and points out that installing and maintaining Lin is just that easy. He could have them set up with a Lin server that support WinXX clients and Lin clients that boot just that way overnight, and that permit the office staff to gradually convert to Lin as they learn that it is mostly virusproof, that ooffice pretty much just "works" like msoffice, that a browser is a browser and firefox is a decent one, that there are several hundred free nifty desktop games to while away those tedious cubicle hours when nobody's looking intead of three. At a cost of $50/seat and they can get rid of 2/3 of their admin staff at the same time because one admin can easily support 100-200 desktop seats... Hey, I can dream, can't I? >> The worst thing that such a strategy might require is a rebuild of user >> applications for both distros, but with shared libraries to my own >> admittedly anecdotal experience this "usually" isn't needed going from >> older to newer (that is, an older Centos built binary will "probably" >> still work on a current FC node, although this obviously depends on the >> precise libraries it uses and how rapidly they are changing). It's a >> bit harder to take binaries from newer to older, especially in packaged >> form. There you almost certainly need an rpmbuild --rebuild and a bit >> of luck. >> > > I use Debian - I've never had to learn about more than one repository > and one distribution for everything I need. What is this "rebuilding" of > which you speak :) Ha. I remember well the time that we considered Debian in our department and rejected it because its stable distro suffered from precisely the same problem that RHEL/Centos suffer from now. It was very stable, it worked excellently well, and it was way, way behind the hardware curve in libraries and kernel support. It may well be that they've done a better job than RH at recognizing this as a core user requirement in pretty much any environment so that the stable release tracks the kernel and new hardware better (dealing with libraries and dependencies as required). It would be pretty easy to do. It's just unfortunate that Linux has never QUITE managed to turn the corner and create clean layers of separation between the hardware and kernel, the core libraries and compiler, and application space. Hence the need for distributions at all per se, hence the need for distribution "releases" with applications pretty much all rebuilt just for the functional core in question. The weird thing is that in principle, both rpm and apt permit one to do much better. This is really a problem in computer science and software design and OS organization that is SOLVABLE. Packaging schema contain the hooks required to do so, and the open source community has worked out truly awesome methodology for maintaining a >>huge<< collection of packages (I just grabbed images of FC 6 for i386 and x86_64 from Duke's repo, and they ate close to 30 GB of disk for just the binary RPMs!) The problem is all in the partitioning -- creating "independently" maintainable layers. I have modest hopes for HAL -- it was something of a joke previous to FC 5 or 6, but in 6 it actually works perfectly and transparently a lot of time. This is the kind of thing that is necessary -- with enough abstraction it might be possible to maintain a kernel snapshot "indefinitely" by simply updating its collection of modules and hal itself, so that applications "just work" with new hardware without having to upgrade to an unstable/rawhide release. >> Truthfully, cluster installation and administration has never been >> simpler. >> > > I think you underestimate your expertise - and the expertise on this > list. My mantra is that cluster administration should be simple and > straightforward: in reality, it's seldom so. It depends on the paradigm you adopt, and how lucky you are in terms of hardware matching the capabilities of your distro/release. Which perhaps "shouldn't" be a matter of luck, but often is as there is nothing that can protect you from "lemon" hardware but buying from a vendor that will if necessary completely replace it. (Even prototyping won't always reveal a problem -- it just "probably" will.) IF you select hardware from a vendor that guarantees hardware compatibility with any of the current/mainline distros -- and there are several that do -- AND you select one of those mainline (well-supported and automagically installable) distros AND you learn to master its automagic installation techniques, then managing any sort of linux operation from a single machine to an organization-spanning LAN consisting of an arbitrary mixture of servers, workstation/office LANs, and clusters has never been simpler. That is a true statement. A single repo mirror set, a single homemade package repo, and PXE permit a single individual to provide ALL the software installation and maintenance support required by a large company under these circumstances. Individuals can install linux on their own hardware (at their own risk) at will from the repo(s), departments that follow the hardware rules can install and maintain standardized systems any of a number of ways, and in all cases a pro-class distribution updates all of these systems in a fully automatic way e.g. nightly to the current repo update level, making it easy to install new software or update old software. Cluster admins have it even easier, as their (linux distro compatible) nodes are likely to be all IDENTICAL (in groups, at least, over several generations) and homogeneity is the friend of the administrator just as heterogeneity is Evil Incarnate. Give me a switch and cables and a rackful of Penguin boxes (please!:-), one equipped with a row of hot-swappable disks and a tape library, and I'll take my laptop and its currently FC6-full backpack disk and return you a functional cluster in the amount of time required to physically assemble the nodes plus less than a day to (re)install them with a perfectly reasonable cluster configuration, very nearly independent of the number of nodes or racks. Give me a couple or three days and I can probably arrange to install the cluster a couple or three different ways -- diskful, diskless, mixed, scyldified. Not ALL cluster needs would be satisfied by this of course. That's the basic problem described in detail above. If the cluster "required" RHEL/Centos release X so it could run commercial package Y (and it didn't just run anyway on FC6, which it probably would do:-) and the penguin hardware "required" FC6 because older RHEL/Centos kernels just don't support the network device or dual core dual CPU AMD x86_64 BIOS, then yeah, you enter one form of Linux Hell from which there is no easy escape but to not get the unsupported hardware in the first place, no matter how much your users beg for bleeding edge hardware, OR getting your #&!@ software vendor that you are PAYING to REBUILD their damn application for FC6 (and in the process, package the thing up so it autobuilds as RPMs or whatever) at least as well as all the maintainers of the 6000-odd FREE packages in FC6 manage to package them up (grrr) OR backporting kernels and key libraries from FC6 to RHEL/Centos whatever -- maybe, possibly, don't hold your breath. Hell. Yup, then yum and friends, permitted RPM-derived linuces to emerge from the long night of software dependency hell (where Debian had long since stepped into the light). It is time to really focus on hardware dependency hell and conditional provisioning trees, both of which are well within the capabilities of modern packaging systems and the general linux design. Conditional provisioning trees, in particular, could really revolutionize things and perhaps make it possible to get away from the notion of the "complete distribution release". The current paradigm, which worked amazingly well for order of a few hundred packages, does not scale to a few thousand particularly well, and we're well on our way to 10 Kpkg and up distribution releases, which will be a maintenance nightmare under the current scheme. I think, anyway. The future should be interesting... as always. It would be funny, in a sick sort of way, if Windows manages to hold on in the face of linux because it supports LESS software (but all of the hardware, nearly perfectly). Most people don't need more than a few hundred of the ~10 Kpkgs available. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list