[Beowulf] Re: failure trends in a large disk drive population

Robert G. Brown rgb at phy.duke.edu
Fri Feb 16 15:13:45 PST 2007


On Fri, 16 Feb 2007, David Mathog wrote:

> Justin Moore wrote:
>> Subject: Re: [Beowulf] failure trends in a large disk drive population
>> To: Eugen Leitl <eugen at leitl.org>
>> Cc: Beowulf at beowulf.org
>> Message-ID: <Pine.LNX.4.63.0702161515530.20861 at kahlo.cs.duke.edu>
>> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>>
>>
>>> http://labs.google.com/papers/disk_failures.pdf
>>
>> Despite my Duke e-mail address, I've been at Google since July.  While
>> I'm not a co-author, I'm part of the group that did this study and can
>> answer (some) questions people may have about the paper.
>>
>
> Dangling meat in front of the bears, eh?  Well...

Hey Justin.  Are you going to stay in NC and move to the new facility as
they build it?

Let me add one general question to David's.

How did they look for predictive models on the SMART data?  It sounds
like they did a fairly linear data decomposition, looking for first
order correlations.  Did they try to e.g. build a neural network on it,
or use fully multivariate methods (ordinary stats can handle it up to
5-10 variables).

This is really an extension of David's questions below.  It would be
very interesting to add variables to the problem (if possible) until the
observed correlations resolve (in sufficiently high dimensionality) into
something significantly predictive.  That would be VERY useful.

     rgb

>
> Is there any info for failure rates versus type of main bearing
> in the drive?
>
> Failure rate versus any other implementation technology?
>
> Failure rate vs. drive speed (RPM)?
>
> Or to put it another way, is there anything to indicate which
> component designs most often result in the eventual SMART
> events (reallocation, scan errors) and then, ultimately, drive
> failure?
>
> Failure rates versus rack position?  I'd guess no effect here,
> since that would mostly affect temperature, and there was
> little temperature effect.
>
> Failure rates by data center?  (Are some of your data centers
> harder on drives than others?  If so, why?)  Are there air
> pressure and humidity measurements from your data centers?
> Really low air pressure (as at observatory height)
> is a known killer of disks,  it would be interesting if lesser
> changes in air pressure also had a measurable effect.  Low
> humidity cranks up static problems, high humidity can result
> in condensation.  Again, what happens with values in between?
> Are these effects quantifiable?
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list