[Beowulf] Performance characterising a HPC application

Richard Walsh rbw at ahpcrc.org
Mon Mar 26 15:01:45 PDT 2007

Greg Lindahl wrote:
> On Mon, Mar 26, 2007 at 03:58:59PM -0500, Richard Walsh wrote:
>> but of course aggregation is a legitimate optimization technique
>> because not all message patterns are of the Gups variety just as not
>> all memory references are absent locality.
> This is true, although I would call it more of an "ease of use" issue.
> Everyone in the MPI arena already knows they're supposed to send as
> big of a message as possible, so it's fairly rare to find
> high-performance MPI codes that see an improvement with message
    Agreed ... but ... without drifting to far off topic ...
> aggregation. In an ideal world the programmer wouldn't have to
> explicitly think about aggregation, it would just happen. But today's
> codes don't assume that.
    it becomes a requirement to obtain performance in more implicit
parallel programming
    models like UPC (unless you Balkanize it with programmed aggregation
via upc gets/puts or
    MPI-like UPC collectives).  Without a global memory vector
operation/instruction you
    cannot amortize latency using a pipeline as you can on the Cray
X1E.   Perhaps that is
    the ideal world that you are referring too ... ;-) ... although
aggregation has advantages over
    pipelining that stand on their.
> I only referred to GUPs as it's a widely available microbenchmark
> which is not gamed by this optimization. But message rate and
> streaming bandwidth are completely wrecked.
    Understood ...



