<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>Google published a study on disk failures. </div><div><br></div><div><span style="font-size: 15px; line-height: 19px; white-space: nowrap; -webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); -webkit-text-size-adjust: none; "><a href="http://research.google.com/pubs/pub32774.html">http://research.google.com/pubs/pub32774.html</a></span></div><div><br></div><div>They provide some interesting data on AFR as a function of disk age among other data</div><div><br></div><div><br></div><div>
<div class="page" title="Page 4">
<img src="cid:/page4image188" alt="page4image188" width="226.784851" height="170.088684">
</div><div class="page" title="Page 4"><br></div><div class="page" title="Page 4">Deepak </div><div class="page" title="Page 4"><br></div>On Apr 19, 2013, at 2:50, Fred Youhanaie <<a href="mailto:fly@anydata.co.uk">fly@anydata.co.uk</a>> wrote:<br><br></div><blockquote type="cite"><div><span></span><br><span></span><br><span>On 19/04/13 00:01, mathog wrote:</span><br><blockquote type="cite"><span>High end SATA and SAS disks claim MTBF values that work out to over 100</span><br></blockquote><blockquote type="cite"><span>years, and yet it is a common</span><br></blockquote><blockquote type="cite"><span>observation that certain models fail at rates entirely inconsistent</span><br></blockquote><blockquote type="cite"><span>with those values. For instance,</span><br></blockquote><blockquote type="cite"><span>75% of all drives of one model dead in < 6 years. (Cited by one poster</span><br></blockquote><blockquote type="cite"><span>in this thread:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span><a href="https://groups.google.com/forum/#!topic/comp.unix.solaris/zQjoyc8T01Y">https://groups.google.com/forum/#!topic/comp.unix.solaris/zQjoyc8T01Y</a></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>). Additionally, manufacturer warranties at best only go to 5 years,</span><br></blockquote><blockquote type="cite"><span>which suggests the manufacturers</span><br></blockquote><blockquote type="cite"><span>don't have a whole lot of faith in their MTBF values.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Some of you have huge amounts of storage, how many disk models lasted</span><br></blockquote><blockquote type="cite"><span>as long as their MTBF suggests</span><br></blockquote><blockquote type="cite"><span>they should? (Personally we have only one set of disks that are still</span><br></blockquote><blockquote type="cite"><span>consistent with the claimed MTBF,</span><br></blockquote><blockquote type="cite"><span>a set of 6 Fibre Channel disks that came with a Sun server and are now</span><br></blockquote><blockquote type="cite"><span>10 years old - with no failures.)</span><br></blockquote><span></span><br><span>You may find this paper helpful, some of the data sets used in their studies come from large HPC sites:</span><br><span></span><br><span> Bianca Schroeder, Garth A. Gibson</span><br><span> Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?</span><br><span> <a href="http://dl.acm.org/citation.cfm?doid=1288783.1288785">http://dl.acm.org/citation.cfm?doid=1288783.1288785</a></span><br><span></span><br><span>If you, or your institution, do not have access to the ACM publications, you may be able to find a free copy posted by the authors, ACM does allow that :)</span><br><span></span><br><blockquote type="cite"><span>How do they come up with the MTBF values for disks anyway? Clearly it</span><br></blockquote><blockquote type="cite"><span>is not based on watching a large</span><br></blockquote><blockquote type="cite"><span>sample of disks for countless years!</span><br></blockquote><span></span><br><span>I can't remember if I have read it in the above paper or elsewhere that users in the field tend to replace disks on the first signs of failure, e.g. SCSI warnings, while manufacturers' tests may run </span><br><span>to total failure, which leads to claims of longer MTTF/MTBF values by the manufacturers.</span><br><span></span><br><span>Cheers</span><br><span>Fred</span><br><span>_______________________________________________</span><br><span>Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing</span><br><span>To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf">http://www.beowulf.org/mailman/listinfo/beowulf</a></span><br></div></blockquote></body></html>