[Beowulf] Surviving a double disk failure
Marian Marinov
mm at yuhu.biz
Mon Apr 13 00:56:28 PDT 2009
On Friday 10 April 2009 23:15:54 David Mathog wrote:
> Billy Crook <billycrook at gmail.com> wrote:
> > As a very,
> > very, general rule, you might put no more than 8TB in a raid5, and no
> > more than 16TB in a raid6, including what's used for parity, and
> > assuming magnetic, enterprise/raid drives. YMMV, Test all new drives,
> > keep good backups, etc...
>
> Thankfully I don't have to do this myself, not having data anywhere near
> that size to cope with, but it seems to me that backing up a nearly full
> 16TB RAID is likely to be a painful, expensive, exercise.
>
> Going with tape first...
>
> The fastest tape drives that I know of are Ultrium 4's at 120 MB/s. In
> theory that could copy 1GB every 8.3 seconds, 1TB every 8300 seconds (
> AKA 138 minutes, or a bit over 2 hours), and for that 16 TB data set,
> something over 32 hours. Except that there is no tape with that
> capacity, Max listed is still 800 GB, so it would take 20 tapes. And
> really obtaining a sustained 120MB/s from the RAID to the tape is likely
> extremely challenging. In any case, it looks like this calls for a tape
> robot of some sort, with many drives in it. Not cheap. On the plus
> side, transporting a box of 20 tape cartridges to "far away" is not
> particularly difficult, and they are fairly impervious to abuse during
> shipment.
>
> The other obvious option is to replicate the RAID. Now if the duplicate
> RAID is on site, connected by a 1000baseT network, one could obtain a
> very similar transfer rate - and a full backup would take just as long
> as for the single tape drive (neglecting rewind and cartridge change
> times). This at the expense of still losing all the data in some sort
> of sitewide disaster. I can imagine, and suspect somebody has this
> already, implementing, a specialized disk->disk connect, such that one
> would plug Raid A into Raid B, and all N disks in A could copy
> themselves in parallel onto all N disks in B at full speed. Assuming
> 1TB disks and a sustained 75Mb/sec read from A and write to B, the whole
> copy would be done in about 222 minutes. Not exactly the blink of an
> eye, but a heck of a lot better than 32 hours. Placing the backup RAID
> physically offsite would improve the odds of the data surviving, but
> reduce the bandwidth available, and moving the copied RAID physically
> offsite after each backup is a recipe for short disk lives.
>
> Since all of the obvious options are so slow, I expect most sites are
> doing incremental backups. Which is fine, until the day comes when one
> has to restore the entire data array from two year's worth of
> incremental backups. Or maybe folks carry the tape incremental backups
> to the offsite backup RAID and apply them there?
>
> Is there an easier/faster/cheaper way to do all of this?
I had a client where we setup 2 servers in 2 different physical locations with
good interconnect between them(1Gbit/s).
So both servers had identical hardware setup (RAID5 with 8x 1TB disks, 1 Hot
spare and 2 NICs, one dedicated for backup and one for system usage).
What I did, was to setup a DRBD device between both machines so when there is
a power outage in the first location or a disaster they had another server 20km
away that was serving their data(this includes a MySQL, PostgreSQL and files).
This setup is used both as backup(DR) and failover.
Regards
Marian Marinov
Head of System Operations at Siteground.com
More information about the Beowulf
mailing list