# [Beowulf] Are disk MTBF ratings at all useful?

mathog mathog at caltech.edu
Mon Apr 22 09:07:13 PDT 2013

```In partial answer to the subject question, let us apply the mode of
analysis used by the drive manufacturers
to human life expectancy, as if Humans were one of their products.
That is, what is the Human AFR and
MTBF? Unlike for disk drives, we can easily obtain a table of USA
mortality rates, this one
is for the year 2007:

http://www.cdc.gov/nchs/data/dvs/MortFinal2007_Worktable23r.pdf

Looking at the first row of the table, which is the data for the whole
country, we see that it has a bathtub
shaped curve, with a relatively high "early failure rate", which
decreases to a minimum for the
ages 5-14, and then an increasing "failure rate" with advancing years.

Now assume the "manufacturer" calculates the AFR assuming a "working
life" for the "product"
of 20 years.  The total "failures"/100,000 over that period measured in
2007 were:

685.4 + 4*28.6 + 10*15.3 + 5*79.9 = 1352.3

Giving a 20 year failure rate of
1352.3 / 100000 = .013523
and an AFR of .013523/20 = .000676,
or .0676%.

So the MTBF for the humans (in years, not hours), is 1/.000676 = 1479
years.
This number is just as nonsensical for people as 150 years is for
disks.

In the human case, since we have all the data, we can see exactly why
the result is so far off.
In rough terms the human mortality rate doubles for every decade of
age.  Consequently any AFR
calculated up to an age below the actual MTBF (average lifespan) will
be an underestimate, and
the earlier the cut off, the further off the value will be.   This is
on top of the other issue
which affects the calculations for disks - the definition of a "failed
unit" used by the manufacturers
is much less stringent than that employed by the end users/vendors.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

```