[Beowulf] Really efficient MPIs??

Wed Nov 28 07:06:29 PST 2007

>At 10:31 PM 11/27/2007, you wrote:
>>Hello,
>>
>>Because today the clusters with multicore nodes are quite common 
>>and the cores within a node share memory.
>>
>>Which Implementations of MPI (no matter commercial or free), make 
>>automatic and efficient use of shared memory for message passing 
>>within a node. (means which MPI librarries auomatically communicate 
>>over shared memory instead of interconnect on the same node).
>>
>>regards,
>>Ali.
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit 
>>http://www.beowulf.org/mailman/listinfo/beowulf
>
>The latest MPICH2 from Argonne (may be version 1.06) complied for 
>the ch3:nemesis shared memory device has very low latency -- as low 
>as 0.06 microseconds -- and very high bandwidth.  It beats LAM in 
>Argonne's tests. Here are details:
>
>www.pvmmpi06.org/talks/CommProt/buntinas.pdf
><http://info.mcs.anl.gov/pub/tech_reports/reports/P1346.pdf>info.mcs.anl.gov/pub/tech_reports/reports/P1346.pdf
>ftp.mcs.anl.gov/pub/mpi/mpich2-doc-CHANGES.txt.
>
>We are getting higher latencies than that on various hardware, so 
>obviously YMMV.
>
>
>Mike

Oops, sorry.  Early morning typing-while-sleeping.

The latencies claimed by Argonne for core-to-core 
on-board  communication with MPICH2 compiled using the ch3:nemesis 
device are 0.3-0.5 microseconds, not 0.06.  There's also no claim 
about what happens when you use it for mixed on-board and off-board comms.

Our recent dual-core 64-bit AMD boards get 0.6 microsecond latency 
core-to-core, while our older 32-bit ones get 1.6.  That's all by netpipe test.

Mike 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071128/24a57471/attachment.html>