availability of Memory compression routine

Huntsinger, Reid reid_huntsinger at merck.com
Thu Jul 18 08:58:46 PDT 2002

Have you looked at the zlib library? http://www.gzip.org/zlib/. Or the
library underneath bzip2 (manual at
ftp://sources.redhat.com/pub/bzip2/docs/manual.pdf). You might have a win
here if you can arrange the buffer size to balance compression time and
transmission time.

Is it possible to do the local calculations directly using the
lower-triangular representation, or something even more compact (if
additional structure can be assumed)?

Reid Huntsinger

-----Original Message-----

Date: Thu, 18 Jul 2002 10:30:07 +0800 (HKT)
From: Kwan Wing Keung <hcxckwk at hkucc.hku.hk>
To: <beowulf at beowulf.org>
cc: Kwan Wing Keung <hcxckwk at hkucc.hku.hk>
Subject: availability of Memory compression routine

Dear Colleagues,

Recently I have been working in parallelization a user's program that
involved repeated mpi_broadcast of a big 2D-array (around 1000*1000
complex *16) from the master to each compute slaves.  The parallelization
is now completed, but the speed efficiency is not very high.

Basically we found that upon using 4-5 processors, the program can speed
up to around 60% (i.e. 40% of the original serial execution time).
Further increase in no. of processor will not help.  Though the clocked
CPU time for each slave goes down, the wallclock duration is nearly flat.
Likely the network is already "saturated".

By using the Hermitian property, I can now reduce the communication
size by half (the upper triangle can be locally generated from the
elements in lower triangle in each slave after communication).
The "saturated" time is now reduced for another 45%.

My question is now whether we have a generic memory compression routine
that allow the compression of a big memory chunk to a much smaller one
like that used in "zip" or "compress".  Of course we are talking about
compression for memory variable inside a standard Fortran program BUT
NOT the compression in a disk file.

In this case we can first compress the huge array and then use
mpi_broadcast to send the compressed data.  Upon receiving the compressed
data, each slave can decompress it to retrieve the original data.
In simple word, we are sacrifying local computation vs communication.

Any suggestion is whole heartedly welcome.

W.K. Kwan
Computer Centre

p.s. I prefer the compression/decompression routines in pure F77 coding,
i.e. with no recursion.


Beowulf mailing list
Beowulf at beowulf.org

End of Beowulf Digest

Notice:  This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message.  If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.


More information about the Beowulf mailing list