[Beowulf] 32 nodes cluster price

Mike Davis jmdavis1 at vcu.edu
Sun Oct 7 09:54:42 PDT 2007


And what would happen if 2 drives died on a software RAID5? The problem 
with the example is that it could happen whether one uses software or 
hardware RAID. The real issue is that important data was stored and not 
backed up. Bad things happen when you have a bad storage strategy.

I have run HW RAID for well over a decade. I've used units from 
manufacturers and integrators. I've had HW from Apple, EMC, Sun, DEC, 
IBM as well as from small shops like Partners Data.. I've also run SW 
RAID primarily for less critical data. Regardless of which method I 
choose, making sure that there are regular reliable backups is important.

Controllers can have problems, but so can software.

Mike Davis

Bill Rankin wrote:

>
> On Oct 5, 2007, at 4:17 PM, Leif Nixon wrote:
>
>> "Geoff Galitz" <geoff at galitz.org> writes:
>>
>>> Why do you automatically distrust hardware raid?
>>
>>
>> To some extent I share Mark's sentiment. I certainly trust the
>> Linux kernel more than the firmware in a cheap raid controller.
>
>
> Let me offer up a somewhat concrete example of a problem with  
> hardware raid.
>
> A local group around here kept some Very Important Data on a hardware  
> raid array.  Due to several factors, a backup was not made of certain  
> data.  The device lost a drive and started an automagic rebuild on  
> one of the hot spares.  The sudden beating that the other drives took  
> (because of the rebuild) caused a second hard drive to fail (always a  
> concern with RAID5).
>
> Since the data was not fully backed up, the drives were sent out for  
> a Very Expensive Recovery.  Most of the data was recovered but once  
> the drives were reinstalled in the enclosure, the hardware raid could  
> not be made to understand that all the drives were now okay.  It  
> essentially got itself into an unrecoverable state that could not be  
> changed by us mere mortals (since data formats and such on hardware  
> raid tend to be proprietary).  So the entire array had to be sent out  
> for another Even More Expensive Recovery to get the data back.
>
> Now while this is kind of a "perfect storm" in turns of hardware and  
> data failure, it does illustrate the extent of control that you give  
> up when going with a hardware raid solution.  I think that the higher  
> end vendors (ie. NetApp, EMC, et al) have their reliability up to the  
> point where this is much less of a risk.  But for the low-end beer  
> budget cluster, software raid is probably still the way to go.  As  
> for the "mid-tier" vendors, I would be very cautious and pay close  
> attention to the worst case data lose scenario.
>
> Good luck,
>
> -bill
>
>  
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list