[Beowulf] Accelerator for data compressing
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comFri Oct 3 08:45:43 PDT 2008
- Previous message: [Beowulf] Accelerator for data compressing
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Carsten Aulbert wrote: > If 7-zip can only compress data at a rate of less than say 5 MB/s (input > data) I can much much faster copy the data over uncompressed regardless > of how many unused cores I have in the system. Exactly for these cases I > would like to use all cores available to compress the data fast in order > to increase the throughput. This is fundamentally the issue. If the compression time plus the tranmit time for the compressed data is greater than the transmit time for the uncompressed data, then the compression may not be worth it. Sure, if it is nothing but text files, you may get 60-80+% compression rates. But for the case of (non-pathological) binary data, it might be only a few percent. So in this case, even if you could get a few percent delta from the compression, is that worth all the extra time you spend to get it? At the end of the day the question is how much lossless compression can you do in a short enough time for it to be meaningful in terms of transmitting the data? > > Do I miss something vital? Nope. You got it nailed. Several months ago, I tried moving about 600 GB of data from an old server to a JackRabbit. The old server and the JackRabbit had a gigabit link between them. We regularly saw 45 MB scp rates (one of the chips in the older server was a Broadcom). I tried this with and without compression. With compression (simple gzip), the copy took something like 28 hours ( a little more than a day). Without compression, it was well under 10 hours. In this case, compression (gzip) was not worth it. The command I used for the test was uncompressed: cd /directory tar -cpf - ./ | ssh jackrabbit "cd /directory ; tar -xpvf - " compressed: cd /directory tar -czpf - ./ | ssh jackrabbit "cd /directory ; tar -xzpvf - " if you want to spend more time, use "j" rather than "z" in the options. YMMV, but I have been convinced that, apart from specific use cases with text only documents or documents known to compress quickly/well, that compression prior to transfer may waste more time than it saves. This said, if someone has a parallel hack of gzip or similar we can pipe through, by all means, I would be happy to try it. But it would have to be pretty darned efficient. 100MB/s means 1 byte transmitted,on average, in 10 nanoseconds. Which means for compression to be meaningful, you would need to compute for less time than that to increase the information density. Put another way, 1 MB takes about 10 ms to send over a gigabit link. For compression to be meaningful, you need to compress this 1MB in far less than 10 ms and still transmit it in that time. Assuming that any compression algorithm has to walk through data at least once, A 1 GB/s memory subsystem takes about 1 ms to walk through this data at least once, so you need as few passes as possible through the data set to construct the compressed representation, as you will still have on the order of 1E+5 bytes to send. I am not saying it is hopeless, just hard for complex compression schemes (bzip2, etc). When we get enough firepower in the CPU (or maybe GPU ... hmmmm) the situation may improve. GPU as a compression engine? Interesting ... Joe > > Cheers > > Carsten -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Accelerator for data compressing
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
