[Beowulf] Accelerator for data compressing
Vincent Diepeveen
diep at xs4all.nl
Fri Oct 3 03:53:49 PDT 2008
hi Carsten,
Ah you googled 2 seconds and found some oldie homepage.
Try this homepage www.maximumcompression.com
Far better testing over there. Note that it's the same testset there
that gets compressed a lot.
In real life, database type data is having all kind of patterns which
PPM type compressors find.
My experience is that at terabyte level the better compressors at
maximumcompression.com,
are a bit too slow (PAQ) and not so good like simple things like 7-zip.
Look especially at compressed sizes and decompression times.
The only thing you want to limit over your network is the amount of
bandwidth over your network.
A real good compression is very helpful then. How long compression
time takes is nearly not relevant,
as long as it doesn't take infinite amounts of time (i remember a new
zealand compressor which took 24
hours to compress a 100MB data). Note that we are already at a phase
that compression time hardly
matters, you can buy a GPU for that to offload your servers for that.
Query time (so decompression time) is important though.
If we look to graphics there:
026 7-Zip 4.60b -m0=ppmd:o=4 764420 81.58 1.4738
..
94 BZIP2 1.0.5 -9 890163 78.55
1.7162
..
158 PKZIP 2.50 -exx 1250536 69.86
2.4110
159 HIT 2.10 -x 1250601 69.86
2.4111
160 GZIP 1.3.5 -9 1254351 69.77 2.4184
161 ZIP 2.2 -9 1254444 69.77
2.4185
162 WINZIP 8.0 (Max Compression) 1254444 69.77 2.4185
Note a real supercompressor is getting it even tinier:
003 WinRK 3.0.3 PWCM 912MB 568919 86.29 1.0969
Again all these tests are at microlevel. Just a few megabtes of data
that gets compressed.
You don't build a big infrastructure just for a few megabytes, it's
not so relevant.
The traffic over your network dominates there, plenty of idle server
cores there is, in fact there is
so many companies now that buy dual cores, as they do not know how to
keep the cores in quad cores
busy.
This is all microlevel. Things really change when you have terabytes
to compress and HUGE files.
Bzip2 is ugly slow for files in gigabyte size, 7-zip is totally
beating it there.
Vincent
On Oct 3, 2008, at 11:27 AM, Carsten Aulbert wrote:
> Hi all
>
> Bill Broadley wrote:
>>
>> Another example:
>> http://bbs.archlinux.org/viewtopic.php?t=11670
>>
>> 7zip compress: 19:41
>> Bzip2 compress: 8:56
>> Gzip compress: 3:00
>>
>> Again 7zip is a factor of 6 and change slower than gzip.
>
> Have you looked into threaded/parallel bzip2?
>
> freshmeat has a few of those, e.g.
>
> http://freshmeat.net/projects/bzip2smp/
> http://freshmeat.net/projects/lbzip2/
>
> (with the usual disclaimer that I haven't tested them myself).
>
> HTH
>
> carsten
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list