[Beowulf] 32 nodes cluster price

Joe Landman landman at scalableinformatics.com
Sun Oct 7 13:23:24 PDT 2007


Bill Rankin wrote:

> Let me offer up a somewhat concrete example of a problem with hardware 
> raid.
> 
> A local group around here kept some Very Important Data on a hardware 
> raid array.  Due to several factors, a backup was not made of certain 
> data.  The device lost a drive and started an automagic rebuild on one 

Let me state the obvious here.  And yes, I know I am likely "preaching 
to the choir"

RAID is not a backup solution.  Again, RAID is not a backup solution. 
If you run without a backup, we can pretty much guarantee that you are 
going to lose data at some point in time.  Again, RAID is not a backup 
solution.

I don't know if I mentioned it, but RAID is not a backup solution.

Anyone who believes otherwise is begging for trouble.  RAID is not a 
backup solution.

Backing up your data is *ALWAYS* important, RAID or not.  Even if it is 
just a mirror of the data.

> of the hot spares.  The sudden beating that the other drives took 
> (because of the rebuild) caused a second hard drive to fail (always a 
> concern with RAID5).

[... anecdote elided ...]

RAID is not a backup solution, anyone mistakenly using it as such *will* 
be burned.

> Now while this is kind of a "perfect storm" in turns of hardware and 
> data failure, it does illustrate the extent of control that you give up 
> when going with a hardware raid solution.  I think that the higher end 

Er... with all due respect, this wasn't a hardware issue.  This was a 
policy issue.

If your data is important, back it up.  It doesn't matter if it is on a 
hardware or software RAID, you absolutely, positively must to a 
cost-benefit analysis of the value of the data and the time/effort/money 
it would cost to recover when (not if) something goes bump in the night.

RAID is not a backup solution.  Not sure I mentioned this.

All hardware has failure modes.  All software has bugs.  Your choice is 
which set of problems are easier to deal with.  We have seen crappy 
hardware, and abominable software.  Bugs in the linux kernel (no, there 
couldn't be any, nah... impossible ...) could just as easily wreck your 
day as a misguided firmware/hardware bug.

Backups are a risk mitigation strategy.  If you have important data, you 
need to back it up.  Moreover, I argue that you need multiple modalities 
of backup/restore.  Call this 20+ years of experience in losing data and 
thinking (naively) that the backup that I have will actually restore... 
properly.

> vendors (ie. NetApp, EMC, et al) have their reliability up to the point 
> where this is much less of a risk.  But for the low-end beer budget 

Er... ah... ok.  All of them have similar issues.  I occasionally hear 
how vendor X's (make the appropriate substitution for X) item, such as a 
network card, or disk drive is *obviously* much better than what is 
available in the mass market, which is why they charge so much more for 
it.  The last time a customer noted that about one of the above named 
vendors (network card as it turned out), I asked them to pull back the 
label on the card and see what was underneath it.  Turns out it was a 
plain old mass market card with a (vendor X) label slapped on it.  I am 
sorry to report that for the vast majority of cases of which I am aware, 
they (the above named vendors X) use generally the same mass market 
stuff you and I do.

Don't mistake this, EMC, Netapp and others *do* offer value.  It just 
isn't in slapping a new label on something, charging 10x for it, and 
somehow convincing the people paying for it that it is magically special 
(that is, unless their label maker has some serious undocumented mojo in 
that label ...)  Their value is in hyperactive support.

> cluster, software raid is probably still the way to go.  As for the 
> "mid-tier" vendors, I would be very cautious and pay close attention to 
> the worst case data lose scenario.

What we tell all our customers (aside from RAID is not a backup 
solution) is that they want to minimize risk.  Where is the risk?  Well 
you can trace it out.   There are many ways to mitigate risk, and reduce 
down time.  RAIN is a great example.

But you can build RAIN out of software RAID as easily as hardware RAID. 
  Remember, all have bugs, your job is to figure out (or work with 
someone who does this for you) how to reduce the impact of potential 
bugs.  RAID is not a backup, and if you run without one, well, ...


> Good luck,

... yeah.

> 
> -bill
> 
>  
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list