[Beowulf] Re: Cooling vs HW replacement

Greg Lindahl lindahl at pathscale.com
Fri Jan 21 14:15:06 PST 2005


On Fri, Jan 21, 2005 at 03:10:31PM -0500, Robert G. Brown wrote:

> Has anyone observed that a megahour is 114 years?  Has anyone observed
> that this is so ludicrous a figure as to be totally meaningless?  Show
> me a single disk on the planet that will run, under load, for a mere two
> decades and I'll bow down before it and start sacrificing chickens.
> 
> Humans don't live a megahour MTBF.  Disks damn sure don't.

That's not what MTBF means.

A device has 3 phases in its life: infant mortality, middle age, and
old age. If you draw the failure rate, it looks like a bathtub:

F R \                                     /
a a  \                                   /
i t   \                                 /
l e    \_______________________________/
    infant        middle-age           old-age

The MTBF comes from the failure rate in middle age. It does not say
when old age starts. The MTBF is usually much longer than the start of
old age, because most disks survive to old age.

And yes, a megahour is the right scale for MTBF: that just means that
1 in 1400 disks dies per month in middle age. If middle age lasts
3 years, then 2.6% of disks will fail in middle age.

-- greg




More information about the Beowulf mailing list