[Beowulf] Surviving a double disk failure

Fri Apr 10 14:46:32 PDT 2009

Joshua Baker-LePain  wrote:

> I currently backup a bit more than 16TB to an LTO3 library and don't find 
> it that painful.  I use AMANDA and break the data down into bite-size 
> chunks.  AMANDA handles spreading these chunks out over the whole backup 
> cycle, so that each night's backup is about the same size (and so takes 
> about the same amount of time).  Each "chunk" gets a full dump once per 
> cycle and incrementals in between. 

Is a lot of your data static and/or in "small" files?  Would your backup
method work if the RAID held many large files, most of which were
modified each day?

The biggest data set I maintain is stored in an Oracle database.  There
isn't a huge amount of data going into it, but whatever does modifies
pretty much all the database files, making any file level incremental
dump pointless.  We get away with shutting it down and doing a full
level 0 ufsdump (yes, it is Solaris) of each file system once a week. 
Crude, but it runs when the users are, or at least should be, asleep, so
it doesn't interfere with their work.  Clearly in a more time sensitive
environment, for instance, if these were AT&T's telephone billing
records, we would have to use the database level backup tools to avoid
taking the DB offline.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech