[Beowulf] Re: failure trends in a large disk drive population

Eugen Leitl eugen at leitl.org
Sun Feb 18 09:01:04 PST 2007

On Fri, Feb 16, 2007 at 06:19:19PM -0800, Joel Jaeggli wrote:

> Ictually I'd bet that's most of the 5400rpm disks would be maxtor
> maxline II nearline drives, netapp also used then in several filers.
> They were the first 300GB drive by a couple of months and came with a 5
> year warranty... I have several dozen of them, and for the most part
> there still working though the warranties are all expiring at this point.

I have two of these sitting here to be installed tomorrow for the 
couple that failed within a few months of each other, and had to be RMAed.
They run pretty hot for 5400 rpm drives, maybe too many platters.
The falure was predicted by an increasing SMART failure rate, until
smartd sent error reports via email, indicating impending failure.
The drives were in a 2x mini-ITX HA configuration in a Travla C147 case, 
which was poorly ventilated -- now the systems are to be recycled as a 
CARP cluster with the pfSense firewall, an embedded version
which boots from CF flash -- that effectively solved the thermal 

I wish Google's data did include WD Raptors and Caviar RE2 drives.
I would really like to know whether these are worth the price premium
over consumer SATA. Btw -- smartd doesn't seem to be able to handle
SATA, at least, last time I tried. 


How do you folks gather data on them?

Oh, and those of you who run GAMMA MPI on GBit Broadcoms, any
lockups? SunFire X2100 seems to be supported (it has a Broadcom
and an nForce NIC, the X2100 M2 seems to have two Broadcoms and two
nVidia NICs) by GAMMA, so I'd like to try it, but rather not risk
a lockup. 

