[Beowulf] 32 nodes cluster price

Bill Broadley bill at cse.ucdavis.edu
Sun Oct 7 12:54:28 PDT 2007


Geoff Galitz wrote:
> 
> 
> Why do you automatically distrust hardware raid?

Because they are low volume parts designed to handle failure modes in very 
complicated environments.  If you buy a hardware RAID card you very well could 
have the only one on the planet with that exact config.  Variables include 
raid controller, hardware revision of the controller, which drives you have 
(and their revision), the motherboard (and it's BIOS version), etc.

So when a drive fails in a strange way you might very well have a problem that 
nobody else on the planet has had.  Additionally you have to gain expertise in 
the particular details, quirks, and bugs of that RAID controller.

The higher end RAID setups of course do not let you pick your own drives and 
some even change the BIOS of the drives to they can guarantee that they have 
tested the various failure modes.  Of course the even higher end models put
the disks in their own enclosure so they can control 100% of the environment
of the drives including nasty little details like power quality, airflow/temp,
controller, vibration, etc.

I've seen a significant number of quirks in 3ware, storage works, areca, and 
dell perc (lsi logic?) controllers.  The related forums discuss the numerous
landmines related to their huge variety of options.  In one particular case I
bought a 3ware 6800 (the then current high end 3ware) which was advertised as 
supporting RAID-5 and ended up losing a filesystem, I called support, they 
said upgrade the firmware.  Which I did, and lost another filesystem.  I 
called back, they said oh try a newer driver... which I did, and lost another 
filesystem.  They then gave a nervous laugh and said "Yeah, they do that, we 
recommend you buy the new 7xxx series, the 6800 wasn't really intended to run 
raid-5.  Software RAID worked fine.

Linux software RAID on the other hand is popular, free, robust, and has likely 
already encountered any strange and wacky behavior from your motherboard, 
revision of disks, brokenness from hardware.  There's likely 1000 times as 
many software RAIDs out in production as there are any particular RAID card, 
RAID firmware, RAID driver, disk hardware, and disk firmware.

Additionally you have to buy TWO hardware raids, often you end up with
significantly less performance, and often the following questions
are rather hard to answer:
* Can I migrate a RAID to another machine?
* Can I split disks different partitions can be in different RAIDs
* Can I be emailed when the RAID changes state?
* Can I migrate the RAID to larger disks gradually (I.e. 2 250GB disks
   to 2 500GB disks without having 4 slots/ports)
* Can I control RAID rebuild speed?
* Enable ECC scrubbing on my schedule?
* Can I migrate the RAID to completely different hardware to debug if it's a
   RAID controller issue?
* Can I grow/shrink raid volumes as well as the disk used per drive?

Sure, they can be answered, but frankly it takes more time than I'm willing
to invest in the flavor of the month card, firmware, and linux kernel
driver.  Especially since it's a small market there seems to be dramatic
differences in price/performance among those trying to gain market share.
Way back when it was adaptec, then 3ware was an upstart, then areca, and now
it seems like adaptec is making a big push with their newer 16 port ish
adaptec controllers.

I've found linux software raid almost always faster than hardware RAID, much
more reliable, and pleasingly consistent.  Uptimes on busy servers with UPS 
are often over 500 days, even back when the linux uptime counter still
rolled.  During a disaster I'd much rather debug and troubleshoot a software 
RAID then trying to find one of the few experts in the world on some 
particular hardware configuration.



More information about the Beowulf mailing list