[Beowulf] Re: Accelerator for data compressing

David Mathog mathog at caltech.edu
Fri Oct 3 09:55:48 PDT 2008


Carsten Aulbert <carsten.aulbert at aei.mpg.de> wrote

> We have a Gbit network, i.e. for us this test is a null test, since it
> takes 7-zip close to 5 minutes to compress the data set of 311 MB which
> we could blow over the network in less than 5 seconds, i.e. in this case
> tar would be our favorite ;)

Many compression programs have a parameter to adjust how hard it tries
to squeeze things, offering a trade off between speed and compression. 
For 7za you want to look at the -m switch, see:
  http://www.bugaco.com/7zip/MANUAL/switches/method.htm
That looks a little old, on my system this file is:
  /usr/share/doc/p7zip/MANUAL/switches/method.htm

Depending on what you are sending, and it all depends on that, you might
get some speed up with a simple run length encoding (for data that tends
to be in long blocks), word or character encoding (for highly repetitive
data, like DNA), or not be able to do any better no matter what
compression method you use (random bits.)

One of these is always rate limiting in data distribution: read,
compression, transmit, receive, decompress, write.  Ignoring
compression, which of these hardware parameters is rate limiting
in your case?  If it is read or write you can speed things up by
adopting a different storage configuration (details vary, but 
underneath it will come down to spooling data off of/onto N disks at
once, instead of just 1, to multiply the disk IO by N.) 

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list