[Beowulf] OT: recoverable optical media archive format?

Dries Kimpe dkimpe at mcs.anl.gov
Wed Jun 9 04:05:13 PDT 2010


* David Mathog <mathog at caltech.edu> [2010-06-08 10:44:55]:

> This is off topic so I will try to keep it short:  is there an
> "archival" format for large binary files which contains enough error
> correction to that all original data may be recovered even if there is a
> little data loss in the storage media?  

> For my purposes these are disk images, sometimes .tar.gz, other times
> gunzip -c of dd dumps of whole partitions which have been "cleared" by
> filling the empty space with one big file full of zero, and then that
> file deleted.  I'm thinking of putting this information on DVD's (only
> need to keep it for a few years at a time) but I don't trust that media
> not to lose a sector here or there - having watched far too many
> scratched DVD movies with playback problems.

> Unlike an SDLT with a bad section, the good parts of a DVD are still
> readable when there is a bad block (using dd or ddrescue) but of course
> even a single missing chunk makes it impossible to decompress a .gz file
> correctly.  So what I'm looking for is some sort of .img.gz.ecc format,
> where the .ecc puts in enough redundant information to recover the
> underlying img.gz even when sectors or data are missing.   If no such
> tool/format exists then two copies should be enough to recover all of an
> .img.gz so long as the same data wasn't lost on both media, and if bad
> DVD sectors always come back as "failed read", never ever showing up as
> a good read but actually containing bad data.  Perhaps the frame
> checksum on a DVD is enough to guarantee that?

You should also consider protecting the metadata of the filesystem; I.e.
what good does it do to have split files, correction data, ... if it
cannot find the file any longer because the damaged sector was in the
directory metadata, not in the actual file data?

RAR has 'recovery record' support that is tunable (you can pick how much
space you want to sacrifice to recovery). You could pack everything in a rar
file (with recovery records turned on) and write the whole file
directly to the dvd (i.e. using dd or growisofs -Z /dev/dvd=rarfile).

The downside is that the filesize will not be preserved, you'd have to
check if unrar can deal with this or if it requires the file size to be
known. A quick test with a small rar archive seems to indicate that it
does not care if extra data is added at the end of the file.

   Dries

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20100609/3327be74/attachment.sig>


More information about the Beowulf mailing list