[Beowulf] Surviving a double disk failure

Joshua Baker-LePain jlb17 at duke.edu
Fri Apr 10 13:48:41 PDT 2009

On Fri, 10 Apr 2009 at 1:15pm, David Mathog wrote

> Thankfully I don't have to do this myself, not having data anywhere near
> that size to cope with, but it seems to me that backing up a nearly full
> 16TB RAID is likely to be a painful, expensive, exercise.
> Going with tape first...
> The fastest tape drives that I know of are Ultrium 4's at 120 MB/s.  In
> theory that could copy 1GB every 8.3 seconds, 1TB every 8300 seconds (
> AKA 138 minutes, or a bit over 2 hours), and for that 16 TB data set,
> something over 32 hours.  Except that there is no tape with that
> capacity, Max listed is still 800 GB, so it would take 20 tapes.  And
> really obtaining a sustained 120MB/s from the RAID to the tape is likely
> extremely challenging.  In any case, it looks like this calls for a tape
> robot of some sort, with many drives in it.  Not cheap.  On the plus
> side, transporting a box of 20 tape cartridges to "far away" is not
> particularly difficult, and they are fairly impervious to abuse during
> shipment.
> Since all of the obvious options are so slow, I expect most sites are
> doing incremental backups.  Which is fine, until the day comes when one
> has to restore the entire data array from two year's worth of
> incremental backups.  Or maybe folks  carry the tape incremental backups
> to the offsite backup RAID and apply them there?
> Is there an easier/faster/cheaper way to do all of this?

I currently backup a bit more than 16TB to an LTO3 library and don't find 
it that painful.  I use AMANDA and break the data down into bite-size 
chunks.  AMANDA handles spreading these chunks out over the whole backup 
cycle, so that each night's backup is about the same size (and so takes 
about the same amount of time).  Each "chunk" gets a full dump once per 
cycle and incrementals in between.  Yes, there is some bookkeeping to be 
done when the "chunks" change drastically in size.  But it really does 
mostly run itself.  A full restore would indeed take some time, but this 
is academia.  We have time.

As for cost, no, the library wasn't cheap (neither was it ridiculously 
expensive either, though -- again, academia).  Once that's acquired, 
though, adding capacity (tapes) is cheap and easy.

Joshua "been using AMANDA for a *long* time" Baker-LePain
QB3 Shared Cluster Sysadmin

More information about the Beowulf mailing list