[Beowulf] OT: recoverable optical media archive format?

David Mathog mathog at caltech.edu
Thu Jun 10 14:43:21 PDT 2010


Scott Atchley wrote:
> I have never used this tool, but I would wonder if your pockmark tool
damaged the rsbep metadata, specifically one or more of the metadata
segment lengths. Bear in mind that corruption of the metadata is not
beyond the realm of possibility, but I assume that the rsbep metadata is
not replicated or otherwise protected.

pockmark just stomps on random parts of the file, so the metadata is as
open to destruction as anything else.  Presumably that shouldn't be an
issue for this sort of program though - the metadata should also be
protected in some manner.

> > In any case, bunzip2 was able to handle the crud on the end, but this
> > would have been a problem for other binary files.
> 
> This is most likely a requirement of the underlying Reed-Solomon
library that requires equal length blocksizes. If your original file is
N bytes and N % M != 0 where M is the blocksize, I imagine it pads the
last block with 0s so that it is N bytes. It should not affect bunzip
since the length is encoded in the file and it ignores anything tacked
onto the end.

bunzip2 id not affected, but it is not a good thing to do in general. 
Not all binary files will be functionally equivalent after null bytes
are added on the end!

> 
> A quick glance at his website, it claims that the length should be the
same. He only shows, however, the md5sums and not the ls -l output.

I forwarded my observations to the program's author and suggested that
if I ran the program incorrectly, or he finds these really are bugs,
that he post back here with corrections.

I tried rsbep again with a test file of size 81920000 bytes (much less
than 32bits unsigned, the first test file was larger in bytes than 32
bits unsigned) but similar problems arose.  One difference, for the
smaller test file the restored files were 240842 bytes bigger, not 97535
like before.

My guess is that since the program dates back to the age of very small
media it may be using "int" or "long" in locations where "long long" is
needed today.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list