[Beowulf] Re: Good upgrade intervals (Was: Oldest functioning clusters)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Nov 23 10:43:27 PST 2004
- Previous message: [Beowulf] Good upgrade intervals (Was: Oldest functioning clusters)
- Next message: [Beowulf] Oldest functioning clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 23 Nov 2004, Josip Loncaric wrote: > So, the work/cost optimal policy is roughly this: > > (1) Initially, pick a budget p which you can sustain every 3-4 years > (2) Buy the highest performing system available at price p > (3) Every time you can get about 5x performance at cost p, repeat (2) > > This simple calculation assumes complete equipment replacement at a > fixed budget. The above does not take into account component upgrades > along the way which may extend the useful life of the original > equipment, nor inflation-adjusted budget increases. However, as Robert > has pointed out, software is a moving target, and eventually old > hardware just won't comfortably run new software. > > Each situation is a bit different, but the above "5x performance > upgrades" rule is not a bad choice. > > Sincerely, > Josip > The only really significant modification to the Josip's beautiful math I'd suggest is one associated with hardware reliability. I've encountered two generically different kinds of hardware: a) Hardware that you buy and is almost totally trouble free "forever". An original 1982 IBM PC, for example, was still running when I gave it away to my kid's kindergarten in 1994 or thereabouts. Aside from a single hard drive crash, it had never required any sort of repair. b) #*!&@ hardware that has anything from a single bad component that fails repeatedly to a total bad mix of components that are prone to failure. We've all seen this, some of us by specific brand or configuration. This is hardware that, service contract or not, eats (our) time and (somebody's, possibly our) money like crazy. Josip's budget computation presumes "a" type hardware. The fraction of "b-ness" of any given hardware batch shifts the curve, possibly significantly, towards earlier replacement. We have some node hardware that we are counting the days on, literally, until we can decommission it and stop fixing it when it (regularly, frequently) breaks. It is "severe b" type and breaks repeatedly even when we replace components with new ones (we've nearly totally rebuilt some of the nodes several times with warranty replacement parts and parts we've bought ourselves). One cannot usually justify throwing grant-purchased hardware out and asking for more before the third year is up, but if one wants to get OUT to 3.5 years or more, try very hard to ensure the "a-ness" of your hardware. Another thing to insert is that for reasonably a-like hardware IN its third+ year that still has some use in it (and has somebody else paying for electricity;-) is "eat your dead" -- let nodes fail, use them for spare parts, and gradually let your node count diminish until you cannot be bothered to take the time to mess with node repair out of the boneyard (which happens, believe me). This depends, of course, on having opportunity-cost time available or a systems person with a bit of spare time on their hands. If your operation already saturates your labor pool, it may be better to let failure mean failure after year 3 and just donate the dead to a recycling or charitable organization. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Good upgrade intervals (Was: Oldest functioning clusters)
- Next message: [Beowulf] Oldest functioning clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
