[Beowulf] Re: Cooling vs HW replacement

Jim Lux James.P.Lux at jpl.nasa.gov
Thu Jan 27 10:41:31 PST 2005

At 12:48 AM 1/27/2005, Karen Shaeffer wrote:
>On Sun, Jan 23, 2005 at 11:14:14PM -0800, Greg Lindahl wrote:
>These numbers are defined by their collective usage in the industry. I
>accept your assertion about their definitions. But the MTBF number has
>no consequential significance to a disk drive manufacturer, and thus
>has a poor confidence associated with it -- and I am going to explain
>As stated previously, the disk drive business is an extremely high
>volume, low margin, technology intensive business. Product cycles last
>about 6 months. A typical disk drive comes out of development and
>ramps up from zero units to several million units within about 6 weeks.
>This is an operational miracle in of its self, but it is standard buisness
>in this industry.

<snip of an excellent discussion>

>I and others have asserted you cannot place much confidence in these
>numbers, because they have no financial consequence to the DDM. (Except
>of course if they are wildly wrong -- which brings with it the particular
>problem of being too late to do anything about it.)  I have explained why
>this is so. I have also explained how the DDM assigns all it's resouces to
>the critical problems, as the rate of production is so high, time is the
>essence in protecting profits. Once production ends, all resources are
>reassigned to the next product to be released.
>It is my understanding that these MTBF numbers are derived from thermal
>cycling in ovens as part of the QA process. All the likely failure modes
>in a disk drive are quite sensitive to thermal conditions. These are the
>media, the heads, the spindle, bearings, lubricants, etc comprising the
>critical mechanical structure, the temperature dependence on band gaps and
>other calibrating circuitry within the electronics, nominal currents within
>the microeclectronics and espectially the power mosfet arrays, the servo
>system cailibration, etc. As the thermal cycling QA processes proceed,
>defects in these systems can be forced to manifest during the testing, and
>the normal state characteristics and stability of these subsystems can also
>be extracted from the experiments. These results are then rigorously
>integrated within the observed profiles and characteristics of drives
>failing within the infant mortality window. It is all highly integrated
>within statistical models for expectations. MTBF numbers are also
>extrapolated from the results. In effect, the MTBF numbers become the long
>term projections that are extrapolated from this data. But the primary
>focus and optimization of processess is intended to create the statistical
>underpinning from which to analyze infant mortality drive failures. The
>uncertainty in these numbers naturally increases for the MTBF
>It's all perfectly logical.

I can see where this process would be typical for quick turnaround consumer 
oriented drives.  However, maybe there are product lines which seem to be 
much longer lived.. call them "professional" grade.  Maybe they aren't 
really the same drive, just the same "model name", but then, it seems that 
there are customers (i.e. Defense Department, etc.) who expect to be able 
to buy "exactly the same drive" for an extended period of time (several 
years, at least), and that the manufacturers would accomodate them.

If I'm making, for instance, high end video editing systems that cost a 
million dollars, I'm probably not interested in saving a few bucks on the 
drives, but I AM interested in drives that last a long time, and that can 
be replaced easily with the same drive.  (I don't build these systems, 
maybe that's not their market model...)

The fast turnaround in modern electronics is a huge curse to us developing 
systems with long lead times.  By the time the component is tested and 
qualified (heck, even breadboarded to see if it's the "right" component), 
it's obsolete and unavailable.  not just disk drives, but things like RAM, 
microprocessors, data converters, RF ICs, etc.

As far as warranties go... Here's an interesting quote from Seagate's website:
(note the identical product gets 1yr in Americas and 2yrs in EMEA countries)

What products are excluded from the new 5-year warranty?
The only products that are excluded are our retail external hard drives 
(external retail products, pocket drives, portable & compact flash drives). 
They are treated much more like a storage appliance and are used in very 
different operating
environments. We have a competitive one-year warranty on external drives in 
the Americas, and a two-year warranty in the EMEA countries.

James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050127/5b2a061a/attachment.html>

More information about the Beowulf mailing list