availability of Memory compression routine
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Huntsinger, Reid reid_huntsinger at merck.comThu Jul 18 08:58:46 PDT 2002
- Previous message: E7500 Chipset
- Next message: availability of Memory compression routine
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Have you looked at the zlib library? http://www.gzip.org/zlib/. Or the library underneath bzip2 (manual at ftp://sources.redhat.com/pub/bzip2/docs/manual.pdf). You might have a win here if you can arrange the buffer size to balance compression time and transmission time. Is it possible to do the local calculations directly using the lower-triangular representation, or something even more compact (if additional structure can be assumed)? Reid Huntsinger -----Original Message----- Date: Thu, 18 Jul 2002 10:30:07 +0800 (HKT) From: Kwan Wing Keung <hcxckwk at hkucc.hku.hk> To: <beowulf at beowulf.org> cc: Kwan Wing Keung <hcxckwk at hkucc.hku.hk> Subject: availability of Memory compression routine Dear Colleagues, Recently I have been working in parallelization a user's program that involved repeated mpi_broadcast of a big 2D-array (around 1000*1000 complex *16) from the master to each compute slaves. The parallelization is now completed, but the speed efficiency is not very high. Basically we found that upon using 4-5 processors, the program can speed up to around 60% (i.e. 40% of the original serial execution time). Further increase in no. of processor will not help. Though the clocked CPU time for each slave goes down, the wallclock duration is nearly flat. Likely the network is already "saturated". By using the Hermitian property, I can now reduce the communication size by half (the upper triangle can be locally generated from the elements in lower triangle in each slave after communication). The "saturated" time is now reduced for another 45%. My question is now whether we have a generic memory compression routine that allow the compression of a big memory chunk to a much smaller one like that used in "zip" or "compress". Of course we are talking about compression for memory variable inside a standard Fortran program BUT NOT the compression in a disk file. In this case we can first compress the huge array and then use mpi_broadcast to send the compressed data. Upon receiving the compressed data, each slave can decompress it to retrieve the original data. In simple word, we are sacrifying local computation vs communication. Any suggestion is whole heartedly welcome. W.K. Kwan Computer Centre HKU p.s. I prefer the compression/decompression routines in pure F77 coding, i.e. with no recursion. --__--__-- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ==============================================================================
- Previous message: E7500 Chipset
- Next message: availability of Memory compression routine
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
