[Beowulf] Third-party drives not permitted on new Dell servers?

Micha Feigin michf at post.tau.ac.il
Tue Feb 16 04:53:28 PST 2010


On Mon, 15 Feb 2010 20:41:08 -0500
Joe Landman <landman at scalableinformatics.com> wrote:

> Rahul Nabar wrote:
> > This was the response from Dell, I especially like the analogy:
> > 
> > [snip]
> >> There are a number of benefits for using Dell qualified drives in
> >> particular ensuring a ***positive experience*** and protecting
> >> ***our data***. While SAS and SATA are industry standards there are
> >> differences which occur in implementation.  An analogy is that
> >> English is spoken in the UK, US >and Australia. While the language
> >> is generally the same, there are subtle differences in word usage
> >> which can lead to confusion. This exists in >storage subsystems as
> >> well. As these subsystems become more capable, faster and more
> >> complex, these differences in implementation can have >greater
> >> impact.
> > [snip]
> > 
> > I added the emphasis. I am in love Dell-disks that get me "the 
> > positive experience". :)
> 
> Please indulge my taking a contrarian view based upon the products we 
> sell/support/ship.
> 
> I see significant derision heaped upon these decisions, which are called 
> "marketing decisions" by Dell and others.  It couldn't be possible, in 
> most commenter's minds that they might actually have a point ...
> 
> ... I am not defending Dell's language (I wouldn't use this or allow 
> this to be used in our outgoing marketing/customer communications).
> 
> Let me share an anecdote.  I have elided the disk manufacturers name to 
> protect the guilty.  I will not give hints as to whom they are, though 
> some may be able to guess ... I will not confirm.
> 
> We ship units with 2TB (and 1.5TB) drives among others.  We burn in and 
> test these drives.  We work very hard to insure compatibility, and to 
> make sure that when users get the units, that the things work.  We 
> aren't perfect, and we do occasionally mess up.  When we do, we own up 
> to it and fix it right away.  Its a different style of support.  The 
> buck stops with us.  Period.
> 
> So along comes a drive manufacturer, with some nice looking specs on 2TB 
> (and some 1.5 and 1 TB) drives.  They look great on paper.  We get them 
> into our labs, and play with them, and they seem to run really well. 
> Occasional hiccup on building RAIDs, but you get that in large batches 
> of drives.
> 
> So now they are out in the field for months, under various loads.  Some 
> in our DeltaV's, some in our JackRabbits.  The units in the DeltaV's 
> seem to have a ridiculously high failure rate.  This is not something we 
> see in the lab.  Even with constant stress, horrific sustained workloads 
> ... they don't fail in ou testing.  But get these same drives out into 
> the users hands ... and whammo.
> 
> Slightly different drives in our JackRabbit units, with a variety of 
> RAID controllers.  Same types of issues.  Timeouts, RAID fall outs, etc.
> 
> This is not something we see in the lab in our testing.  We try 
> emulating their environments, and we can't generate the failures.
> 
> Worse, we get the drives back after exchanging them at our cost with new 
> replacements, only to find out, upon running diagnostics, that the 
> drives haven't failed according to the test tool.  This failing drive 
> vendor refuses to acknowledge firmware bugs, effectively refuses to 
> release patches/fixes.
> 
> Our other main drive vendor, while not currently with a 2TB drive unit, 
> doesn't have anything like this manufacturers failure rate in the field. 
>   When drives die in the field, they really ... really die in the field. 
>   And they do fix their firmware.
> 
> So we are now moving off this failing manufacturer (its a shame as they 
> used to produce quality parts for RAID several years ago), and we are 
> evaluating replacements for them.  Firmware updates are a critical 
> aspect of a replacement.  If the vendor won't allow for a firmware 
> update, we won't use them.
> 
> So ... this anecdote complete, if someone called me up and said "Joe, I 
> really want you to build us an siCluster for our storage, and I want you 
>   to use [insert failing manufacturer's name here] drives because we 
> like them", what do you think my reaction should be?  Should it be 
> "sure, no problem, whatever you want" ... with the subsequent problems 
> and pain, for which we would be blamed ... or should it be "no, these 
> drives don't work well ... deep and painful experience at customer sites 
> shows that they have bugs in their firmware which are problematic for 
> RAID users ... we are attempting to get them to give us the updated 
> firmware to help the existing users, but we would not consider shipping 
> more units with these drives due to their issues."
> 
> Is that latter answer, which is the correct answer, a marketing answer?
>

But what if the customer tells you, ship me your system without a drive, I'll
put whatever I want in there  so you are not my point of contact for failing
drives but you say, no, I won't allow them in my system and I won't even sell
you a replacement of what I do allow in the system?
 
> Yeah, SATA and SAS are standards.  Yeah, in theory, they all do work 
> together.  In reality, they really don't, and you have to test. 
> Everyone does some aspect slightly different and usually in software, so 
> they can fix it if they messed up.  If their is a RAID timeout bug due 
> to head settling timing, yeah, this is fixable.  But if the disk 
> manufacturer doesn't want to fix it ...  its your companies name on the 
> outside of that box.  You are going to take the heat for their problems.
> 
> Note:  This isn't just SATA/SAS drives, there are a whole mess of things 
> that *should* work well together, but do not.  We had some exciting 
> times in the recent past with SAS backplanes that refused to work with 
> SAS RAID cards.  We've had some excitment from 10GbE cards, IB cards, 
> etc. that we shouldn't have had.
> 
> I can't and won't sanction their tone to you ... they should have 
> explained things correctly.  Given that PERC are rebadged LSI, yeah, I 
> know perfectly well a whole mess of drives that *do not* work correctly 
> with them.
> 
> So please don't take Dell to task for trying to help you avoid making 
> what they consider a bad decision on specific components.  There could 
> be a marketing aspect to it, but support is a cost, and they want to 
> minimize costs.  Look at failure rates, and toss the suppliers who have 
> very high ones.
> 
> 
> 



More information about the Beowulf mailing list