[Beowulf] Re: failure trends in a large disk drive population

Mark Hahn hahn at mcmaster.ca
Fri Feb 16 15:01:43 PST 2007


> Is there any info for failure rates versus type of main bearing
> in the drive?

I thought everyone used something like the "thrust plate" bearing
that seagate (maybe?) introduced ~10 years ago.

> Failure rate vs. drive speed (RPM)?

surely "consumer-grade" rules out 10 or 15k rpm disks;
their collection of 5400 and 7200 disks is probably skewed,
as well (since 5400's have been uncommon for a couple years.)

> Or to put it another way, is there anything to indicate which
> component designs most often result in the eventual SMART
> events (reallocation, scan errors) and then, ultimately, drive
> failure?

reading the article, I did wish their analysis more resembled
one done by clinical or behavioral types, who would have evaluated
outcome attributed to all the factors combinatorially.

> Failure rates versus rack position?  I'd guess no effect here,
> since that would mostly affect temperature, and there was
> little temperature effect.

funny, when I saw figure5, I thought the temperature effect was 
pretty dramatic.  in fact, all the metrics paint a pretty clear 
picture of infant mortality, then reasonably fit drives suriving
their expected operational life (3 years).  in senescence, all
forms of stress correlate with increased failure.  I have to 
believe that the 4/5th year decreases in AFR are either due to 
survival effects or sampling bias.

> changes in air pressure also had a measurable effect.  Low
> humidity cranks up static problems, high humidity can result

does anyone have recent-decade data on the conventional wisdom
about too-low humidity?  I'm dubious that it matters in a normal
machineroom where components tend to stay put.

regards, mark hahn.



More information about the Beowulf mailing list