[Beowulf] many cores and ib

Patrick Geoffray patrick at myri.com
Tue May 6 01:46:16 PDT 2008

Gilad Shainer wrote:
> It is the same benchmark that QLogic were and are using for MPI message
> rate, and I guess you know that better then me, don't you?....  I want
> to make sure when one do a comparison he/she will be using the same
> benchmark/output to compare. 

It is not the benchmark, it's the MPI implementation. The benchmark in 
itself is stupid, because it sends a gazillion messages to a single 
node. The MPI implementation is dishonest, because it says "eh, you are 
trying to send a gazillion messages to a single node, let me pack them 
into a single message on the wire for you", completely changing what the 
benchmark is trying to measure.

You are a marketing guy, you just repeat the numbers without 
understanding what they mean. Message coalescing in MVAPICH does nothing 
but make the message rate micro-benchmark irrelevant, it was designed 
that way, and only for that purpose. With message coalescing, 
*everybody* can send 20 Million messages per second, as long as you have 
over 1GB/s of bandwidth.

This is like the header caching "optimization": change the MPI tag for 
each Send in your pingpong benchmark, and see your latency goes up. It's 
because the MPI implementation is smart enough to say "eh, you are 
sending the same message envelope over and over, let me compact the MPI 
header for you". It does not help anything but a micro-benchmark.

I can imagine the next optimization from here: if you happen to send 
messages full of zeros in your ping-pong, MVAPICH will "compress" them 
for you. And somewhere, someone will claim a gazillion bytes per second...


