[Beowulf] Re: failure trends in a large disk drive population

Robin Harker robin at workstationsuk.co.uk
Thu Feb 22 00:10:49 PST 2007


So if we now know, (and we have seen similarly spirious behaviour with
SATA Raid arrays), isn't the real solution to lose the node discs?

Regards

Robin


>
>>> How did they look for predictive models on the SMART data?  It sounds
>>> like they did a fairly linear data decomposition, looking for first
>>> order correlations.  Did they try to e.g. build a neural network on it,
>>> or use fully multivariate methods (ordinary stats can handle it up to
>>> 5-10 variables).
>>>
>>> This is really an extension of David's questions below.  It would be
>>> very interesting to add variables to the problem (if possible) until
>>> the
>>> observed correlations resolve (in sufficiently high dimensionality)
>>> into
>>> something significantly predictive.  That would be VERY useful.
>>>
>>
>> RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of
>> this data. Now the question is, will we ever get access to this data?
>> ;)
>
> As mentioned in an earlier e-mail (I think) there were 4 SMART variables
> whose values were strongly correlated with failure, and another 4-6 that
> were weakly correlated with failure.  However, of all the disks that
> failed, less than half (around 45%) had ANY of the "strong" signals and
> another 25% had some of the "weak" signals.  This means that over a
> third of disks that failed gave no appreciable warning.  Therefore even
> combining the variables would give no better than a 70% chance of
> predicting failure.
>
> To make things worse, many of the "weak" signals were found on a
> significant number of disks.  For example, among the disks that failed,
> many had a large number of seek error; however, over 70% of disks in the
> fleet -- failed and working -- had a large number of seek errors.
>
> About all I can say beyond what's in the paper is that we're aware of
> the shortcomings of the existing work and possible paths forward.  In
> response, we are
> <GOOGLE_NDA_BOT>
> Hello, this is the Google NDA bot.  In our massive trawling of the
> Internet and other data sources, I have detected a possible violation of
> the Google NDA.  This has been corrected.  We now return you to your
> regularly scheduled e-mail.
> [ Continue ]  [ I'm Feeling Confidential ]
> </GOOGLE_NDA_BOT>
>
> So that's our master plan.  Just don't tell anyone. :)
> -jdm
>
> P.S. Unfortunately, I doubt that we'll be willing or able to release the
> raw data behind the disk drive study.
>
> Department of Computer Science, Duke University, Durham, NC 27708-0129
> Email:	justin at cs.duke.edu
> Web:	http://www.cs.duke.edu/~justin/
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


Robin Harker
Workstations UK Ltd
DDI: 01494 787710
Tel: 01494 724498




More information about the Beowulf mailing list