[Beowulf] Accelerator for data compressing

Eric Thibodeau kyron at neuralbs.com
Tue Oct 7 08:45:52 PDT 2008


I've read the entire thread up to now and noticed no one mentioned the 
parallel bzip2 (pbzip2 http://compression.ca/pbzip2/) utility. Although 
not strictly compatible with bzip2 when compressing large files, it's 
still a valid compressor imho. More below...

Xu, Jerry wrote:
> Hello, 
>
>  Currently I generate nearly one TB data every few days and I need to pass it
> along enterprise network to the storage center attached to my HPC system, I am
> thinking about compressing it (most tiff format image data) as much as I can, as
> fast as I can before I send it crossing network ...
I performed a test on *uncompressed* tif images (RAW images from a Canon 
EOS camera iirc) of many 32MB images tarred into a single 2.9Gig file 
put into /dev/shm (simulating stream). Here is the time characteristics 
for compressing the 2.9gig file using pbzip2:

kyron at kyron ~ $ /usr/bin/time -v pbzip2 -b15 -k /dev/shm/TEST.tar -c 
 >/dev/shm/TEST.tar.bzip
        Command being timed: "pbzip2 -b15 -k /dev/shm/TEST.tar -c"
        User time (seconds): 415.95
        System time (seconds): 4.89
        Percent of CPU this job got: 394%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:46.56
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 0
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 3
        Minor (reclaiming a frame) page faults: 677802
        Voluntary context switches: 4196
        Involuntary context switches: 57332
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The system has an Intel Quad Core Q6700 with 8Gigs of RAM (forced at 
667MHz due to Asus's P35 implementation limitations).

Making sense of the figures, the file is 3046748160 bytes, the output 
file is 1427448276 bytes (2 to 1 ratio...which will depend on image 
contents of course).

It took 106.56 seconds (1:46.56 seconds / 4 cores) to process the 
3046748160 which translates into a processing speed of ~ 28E6 bytes/sec 
processing speed. Modern HDDs and GigE surpass this processing speed 
comfortably.

Since it's been brought up in the thread, here are some figures for a 
binary database (float vectors of pattern recognition 
characteristics)...and note how compressible the file ends up being:

kyron at kyron ~ $ /usr/bin/time -v pbzip2 -b15 -k 
~/1_Files/1_ETS/1_Maitrise/Code/K-Means_Cavalin/featg_row.dat -c 
 >/dev/shm/TEST.tar.bzip
        Command being timed: "pbzip2 -b15 -k 
/home/kyron/1_Files/1_ETS/1_Maitrise/Code/K-Means_Cavalin/featg_row.dat -c"
        User time (seconds): 1096.09
        System time (seconds): 7.12
        Percent of CPU this job got: 395%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:38.65
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 0
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 135624
        Voluntary context switches: 10233
        Involuntary context switches: 154397
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

featg_row.dat if 1454853604 bytes and compresses down to 238M (guess 
there is lots of redundancy in the features ;P ) But still, 1454853604 
bytes / 278.65 seconds = 5.2E6 bytes/sec processing speed; still 
insufficient compared to current hardware.

>  So, I am wondering whether
> anyone is familiar with any hardware based accelerator, which can dramatically
> improve the compressing procedure.. suggestion for any file system architecture
> will be appreciated too..  I have couple of contacts from some vendors but not
> sure whether it works as I expected, so if anyone has experience about it and
> want to share, it will be really appreciated !
>   
> Thanks,
>
> Jerry
>
> Jerry  Xu  PhD
> HPC Scientific Computing Specialist 
> Enterprise Research Infrastructure Systems (ERIS) 
> Partners Healthcare, Harvard Medical School
> http://www.partners.org
>
> The information transmitted in this electronic communication is intended only
> for the person or entity to whom it is addressed and may contain confidential
> and/or privileged material. Any review, retransmission, dissemination or other
> use of or taking of any action in reliance upon this information by persons or
> entities other than the intended recipient is prohibited. If you received this
> information in error, please contact the Compliance HelpLine at 800-856-1983 and
> properly dispose of this information.
>
>
>   
HTH

Eric Thibodeau

PS: Interesting figures, I couldn't resist compressing the same binary 
DB on a 16Core Opteron (Tyan VX50) machine and was dumbfounded to get 
horrible results given the same context. The processing speed only came 
up to 6.4E6 bytes/sec ...for 16 cores, and they were all at 100% during 
the entire run (FWIW, I tried different block sizes and it does have an 
impact but this also changes the problem parameters).

eric at einstein ~ $ /usr/bin/time -v pbzip2 -b15 -k 
~/1_Files/1_ETS/1_Maitrise/Code/K-Means_Cavalin/featg_row.dat -c 
 >/dev/shm/TEST.tar.bzip
        Command being timed: "pbzip2 -b15 -k 
/export/livia/home/parallel/eric/1_Files/1_ETS/1_Maitrise/Code/K-Means_Cavalin/featg_row.dat 
-c"
        User time (seconds): 3356.42
        System time (seconds): 5.20
        Percent of CPU this job got: 1481%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:46.94
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 0
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 345519
        Voluntary context switches: 4088
        Involuntary context switches: 9036
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0



More information about the Beowulf mailing list