[Beowulf] Are disk MTBF ratings at all useful?

Deepak Singh mndoci at gmail.com
Fri Apr 19 03:43:43 PDT 2013


Google published a study on disk failures. 

http://research.google.com/pubs/pub32774.html

They provide some interesting data on AFR as a function of disk age among other data




Deepak 

On Apr 19, 2013, at 2:50, Fred Youhanaie <fly at anydata.co.uk> wrote:

> 
> 
> On 19/04/13 00:01, mathog wrote:
>> High end SATA and SAS disks claim MTBF values that work out to over 100
>> years, and yet it is a common
>> observation that certain models fail at rates entirely inconsistent
>> with those values.  For instance,
>> 75% of all drives of one model dead in < 6 years.  (Cited by one poster
>> in this thread:
>> 
>> https://groups.google.com/forum/#!topic/comp.unix.solaris/zQjoyc8T01Y
>> 
>> ).  Additionally, manufacturer warranties at best only go to 5 years,
>> which suggests the manufacturers
>> don't have a whole lot of faith in their MTBF values.
>> 
>> Some of you have huge amounts of storage, how many disk models lasted
>> as long as their MTBF suggests
>> they should?  (Personally we have only one set of disks that are still
>> consistent with the claimed MTBF,
>> a set of 6 Fibre Channel disks that came with a Sun server and are now
>> 10 years old - with no failures.)
> 
> You may find this paper helpful, some of the data sets used in their studies come from large HPC sites:
> 
>    Bianca Schroeder, Garth A. Gibson
>    Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?
>    http://dl.acm.org/citation.cfm?doid=1288783.1288785
> 
> If you, or your institution, do not have access to the ACM publications, you may be able to find a free copy posted by the authors, ACM does allow that :)
> 
>> How do they come up with the MTBF values for disks anyway?  Clearly it
>> is not based on watching a large
>> sample of disks for countless years!
> 
> I can't remember if I have read it in the above paper or elsewhere that users in the field tend to replace disks on the first signs of failure, e.g. SCSI warnings, while manufacturers' tests may run 
> to total failure, which leads to claims of longer MTTF/MTBF values by the manufacturers.
> 
> Cheers
> Fred
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf.org/pipermail/beowulf/attachments/20130419/3799785b/attachment-0001.html 


More information about the Beowulf mailing list