[Beowulf] Re: Cooling vs HW replacement
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at lanl.govFri Jan 21 14:06:07 PST 2005
- Previous message: [Beowulf] Re: Cooling vs HW replacement
- Next message: [Beowulf] Re: Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Robert G. Brown wrote: > >>> 2. higher reliability - typically 1.2-1.4M hours, and usually >>> specified under higher load. this is a very fuzzy area, since >>> commodity disks often quote 1Mhr under "lower" load. > > Has anyone observed that a megahour is 114 years? Has anyone observed > that this is so ludicrous a figure as to be totally meaningless? Show > me a single disk on the planet that will run, under load, for a mere two > decades and I'll bow down before it and start sacrificing chickens. > > Humans don't live a megahour MTBF. Disks damn sure don't. All of the above is true on the "per sample" basis. Moreover, with the product cycles measured in months rather than years, none of the MTBF figures could possibly be based on actual MTBF measurements. Instead, manufacturers use composite statistics, computed from mid-life component failure rates, then quote MTBF as the reciprocal of this number. This practice results in good MTBF numbers, but it amounts to stating that the life expectancy of a 10-year-old kid is 5000 years based on the 99.98% probability that the kid will survive the next year (these numbers are quoted from IEEE Spectrum, Sept. 2004, see http://www.spectrum.ieee.org/WEBONLY/publicfeature/sep04/0904age.html). Both humans and machines fall apart at higher rates in infancy, as well as with age, when built-in redundancy wears thin due to accumulated damage. The disk drive MTBF number does not apply to drives that fail fairly quickly, nor to failure rates of old/heavily used drives. If, somewhat questionably, human life expectancy is taken as a guide, disk manufacturers' MTBF numbers ought to be de-rated by about a factor of 50-70 to make practical sense (e.g. an 1.4M hour MTBF drive might last some 25,000 hours) -- but even this applies only under nominal conditions, where the above-mentioned statistical MTBF estimate is not wildly inaccurate. In other words, a drive may last several years at 20 deg. C ambient temperature. Still, this says nothing about its durability at 40+ deg. C. Given that in many systems failure rates increase exponentially with temperature, e.g. doubling for every 10 degree increase, I would avoid baking a drive unless it was specifically designed for high temperature operation (if such drives even exist). Sincerely, Josip
- Previous message: [Beowulf] Re: Cooling vs HW replacement
- Next message: [Beowulf] Re: Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
