[Beowulf] Really efficient MPIs??

Håkon Bugge Hakon.Bugge at scali.com
Wed Nov 28 11:29:40 PST 2007

At 16:07 28.11.2007, "Michael H. Frese" <Michael.Frese at NumerEx.com> wrote:

>Oops, sorry.  Early morning typing-while-sleeping.
>The latencies claimed by Argonne for core-to-core
>on-board  communication with MPICH2 compiled using the ch3:nemesis
>device are 0.3-0.5 microseconds, not 0.06.  There's also no claim
>about what happens when you use it for mixed on-board and off-board comms.
>Our recent dual-core 64-bit AMD boards get 0.6 microsecond latency
>core-to-core, while our older 32-bit ones get 1.6.  That's all by 
>netpipe test.


Unless you use an MPI which let you control how processes are bound 
to cores (or use taskset), you really don't know what you're measuring.

On modern systems, two cores could be a) on the same die, b) on the 
same socket but different dies, c) on different sockets, and d) on 
different socket where the traffic is routed through a third one. 
Moreover, on Clovertown, the Snoop filter could be enabled or disabled.

So, a core-to-core comparison by two different people, using 
different MPIs and different systems, probably measures two different 
things ;-)


More information about the Beowulf mailing list