availability of Memory compression routine
Mark Hahn
hahn at physics.mcmaster.ca
Fri Jul 19 10:02:48 PDT 2002
> the binary data resulted in only slight compression. Gzipping an
> ascii form of the same data reduced it quiet a bit, in some cases
> smaller than the gzipped binary. So, the point here is that gzip
> is probably not appropriate.
as you probably know, gzip (and compress) are based on finding repeated
strings of bytes in the input, building a dictionary of them, and
encoding later uses of that string as the dictionary index.
this could easily be extended to other atomic types, rather than bytes,
assuming you have at least some repeated values. even a handful
of repeated values would help, though of course you'd REALLY
win of you had strings of repeated values. the simplest case of this
would be runlength encoding.
again, the question is whether you know of some pattern in your data.
for instance, perhaps all your values have the same exponent,
or as someone else mentioned, maybe you can afford to lose the
bottom 16 b of mantissa, or simply use floats rather than doubles.
if your matrix is always composed of 3.14, 69 and 42, you should
be able to code it down to 1-2 bits per element, rather than 64b...
> Is there any more appropriate "profile" or "field" compression schemes
> available that leverage this type of differential encoding? If not,
> it might make any interesting research project for parallel processing
> as well as data storage, and visualization applications.
I once played around with encoding MRI (fMRI actually) images
first with a 2D differential filter, then a simple static huffman tree.
it seemed to work well, giving files about .25 of the original,
but since disk is so cheap... of course, that was based on the fact
that these images have relatively small local dynamic range (only
16b to start with), and are often somewhat smooth. I toyed with the
idea of compressing them in their source space (k/fourier space), too.
More information about the Beowulf
mailing list