[Beowulf] Accelerator for data compressing
Bill Broadley
bill at cse.ucdavis.edu
Fri Oct 3 02:17:52 PDT 2008
Vincent Diepeveen wrote:
> Bzip2, gzip,
>
> Why do you guys keep quoting those total outdated compressors :)
Path of least resistance, not to mention python bindings.
> there is 7-zip for linux, it's open source and also part of LZMA. On
> average remnants
> are 2x smaller than what gzip/bzip2 is doing for you (so bzip2/gzip is
> factor 2 worse).
> 7-zip also works parallel, not sure whether it works in linux parallel.
> 7za is command line
> version.
Seems like the question is related to CPU utilization as well as compression
ratios. Assuming the TIFF files are not already compressed, how fast would
you expect 7-zip to be relative to bzip2 and gzip's compression and
decompression speeds? I was looking for decent bandwidth, and I did look
around a bit and it seemed like things often would compress somewhat better,
often the bandwidth achieved was 5-6x worse. So for squeezing the most out of
a 28k modem... sure. For keeping up with a 100mbit or GigE connection on a
local LAN, not so much.
Google finds:
http://blogs.reucon.com/srt/2008/02/18/compression_gzip_vs_bzip2_vs_7_zip.html
Compressor Size Ratio Compression Decompression
gzip 89 MB 54 % 0m 13s 0m 05s
bzip2 81 MB 49 % 1m 30s 0m 20s
7-zip 61 MB 37 % 1m 48s 0m 11s
So sure you save 28MB, at the cost of 95 seconds. Might make sense if you are
transfering over a slow modem. Also considering the original file was 163MB
it's nowhere near the 6MB/sec that seems to be the target. At 1.5MB/sec you'd
need 4 CPUs running flat out for 2 days to manage 2TB, instead of 1 CPU
running for just 24 hours. Definitely the kind of thing that sounds like it
might make a big difference.
Another example:
http://bbs.archlinux.org/viewtopic.php?t=11670
7zip compress: 19:41
Bzip2 compress: 8:56
Gzip compress: 3:00
Again 7zip is a factor of 6 and change slower than gzip.
> Linux distributions should include it default.
>
> Uses PPM, that's a new form of multidimensional compression that all
> that old junk like
> bzip2/gzip doesn't use.
One man's junk and another man's gold. My use was backup related and I
definitely didn't want to become CPU limited even on large systems with 10TB
of disk and a healthy I/O system. From the sounds of it even with 8 fast
cores that 7zip might easily be the bottleneck.
> TIFF files compress real bad of course. Maybe convert them to some more
> inefficient format,
> which increases its size probably, which then compresses real great with
> PPM.
Er, that makes no sense to me. You aren't going to end up with a smaller file
by encoding a file less efficiently.. under ideal circumstances you might get
back to where you started with a substantial use of cycles. Seems pretty
simple, if the TIFFs are compressed, just send them as is, significant
additional compression is unlikely. If they are uncompressed there's a decent
chance of significant lossless compression, the best thing to do would be to
try it or at least a reference to some similar images.
More information about the Beowulf
mailing list