availability of Memory compression routine
Kwan Wing Keung
hcxckwk at hkucc.hku.hk
Wed Jul 17 19:30:07 PDT 2002
Dear Colleagues,
Recently I have been working in parallelization a user's program that
involved repeated mpi_broadcast of a big 2D-array (around 1000*1000
complex *16) from the master to each compute slaves. The parallelization
is now completed, but the speed efficiency is not very high.
Basically we found that upon using 4-5 processors, the program can speed
up to around 60% (i.e. 40% of the original serial execution time).
Further increase in no. of processor will not help. Though the clocked
CPU time for each slave goes down, the wallclock duration is nearly flat.
Likely the network is already "saturated".
By using the Hermitian property, I can now reduce the communication
size by half (the upper triangle can be locally generated from the
elements in lower triangle in each slave after communication).
The "saturated" time is now reduced for another 45%.
My question is now whether we have a generic memory compression routine
that allow the compression of a big memory chunk to a much smaller one
like that used in "zip" or "compress". Of course we are talking about
compression for memory variable inside a standard Fortran program BUT
NOT the compression in a disk file.
In this case we can first compress the huge array and then use
mpi_broadcast to send the compressed data. Upon receiving the compressed
data, each slave can decompress it to retrieve the original data.
In simple word, we are sacrifying local computation vs communication.
Any suggestion is whole heartedly welcome.
W.K. Kwan
Computer Centre
HKU
p.s. I prefer the compression/decompression routines in pure F77 coding,
i.e. with no recursion.
More information about the Beowulf
mailing list