[Beowulf] Re: failure trends in a large disk drive population
Mark Hahn
hahn at mcmaster.ca
Fri Feb 16 15:01:43 PST 2007
> Is there any info for failure rates versus type of main bearing
> in the drive?
I thought everyone used something like the "thrust plate" bearing
that seagate (maybe?) introduced ~10 years ago.
> Failure rate vs. drive speed (RPM)?
surely "consumer-grade" rules out 10 or 15k rpm disks;
their collection of 5400 and 7200 disks is probably skewed,
as well (since 5400's have been uncommon for a couple years.)
> Or to put it another way, is there anything to indicate which
> component designs most often result in the eventual SMART
> events (reallocation, scan errors) and then, ultimately, drive
> failure?
reading the article, I did wish their analysis more resembled
one done by clinical or behavioral types, who would have evaluated
outcome attributed to all the factors combinatorially.
> Failure rates versus rack position? I'd guess no effect here,
> since that would mostly affect temperature, and there was
> little temperature effect.
funny, when I saw figure5, I thought the temperature effect was
pretty dramatic. in fact, all the metrics paint a pretty clear
picture of infant mortality, then reasonably fit drives suriving
their expected operational life (3 years). in senescence, all
forms of stress correlate with increased failure. I have to
believe that the 4/5th year decreases in AFR are either due to
survival effects or sampling bias.
> changes in air pressure also had a measurable effect. Low
> humidity cranks up static problems, high humidity can result
does anyone have recent-decade data on the conventional wisdom
about too-low humidity? I'm dubious that it matters in a normal
machineroom where components tend to stay put.
regards, mark hahn.
More information about the Beowulf
mailing list