[Beowulf] Are disk MTBF ratings at all useful?

mathog mathog at caltech.edu
Mon Apr 22 09:07:13 PDT 2013


In partial answer to the subject question, let us apply the mode of 
analysis used by the drive manufacturers
to human life expectancy, as if Humans were one of their products.  
That is, what is the Human AFR and
MTBF? Unlike for disk drives, we can easily obtain a table of USA 
mortality rates, this one
is for the year 2007:

http://www.cdc.gov/nchs/data/dvs/MortFinal2007_Worktable23r.pdf

Looking at the first row of the table, which is the data for the whole 
country, we see that it has a bathtub
shaped curve, with a relatively high "early failure rate", which 
decreases to a minimum for the
ages 5-14, and then an increasing "failure rate" with advancing years.

Now assume the "manufacturer" calculates the AFR assuming a "working 
life" for the "product"
of 20 years.  The total "failures"/100,000 over that period measured in 
2007 were:

685.4 + 4*28.6 + 10*15.3 + 5*79.9 = 1352.3

Giving a 20 year failure rate of
1352.3 / 100000 = .013523
and an AFR of .013523/20 = .000676,
or .0676%.

So the MTBF for the humans (in years, not hours), is 1/.000676 = 1479 
years.
This number is just as nonsensical for people as 150 years is for 
disks.

In the human case, since we have all the data, we can see exactly why 
the result is so far off.
In rough terms the human mortality rate doubles for every decade of 
age.  Consequently any AFR
calculated up to an age below the actual MTBF (average lifespan) will 
be an underestimate, and
the earlier the cut off, the further off the value will be.   This is 
on top of the other issue
which affects the calculations for disks - the definition of a "failed 
unit" used by the manufacturers
is much less stringent than that employed by the end users/vendors.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the Beowulf mailing list