[Beowulf] MPICH vs. OpenMPI
Jan Heichler
jan.heichler at gmx.net
Fri Apr 25 05:04:38 PDT 2008
Hallo Håkon,
Freitag, 25. April 2008, meintest Du:
HB> Hi Jan,
HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <jan.heichler at gmx.net> wrote:
>> >From what i saw OpenMPI has several advantages:
>>- better performance on MultiCore Systems
>>because of good shared-memory-implementation
HB> A couple of months ago, I conducted a thorough
HB> study on intra-node performance of different MPIs
HB> on Intel Woodcrest and Clovertown systems. I
HB> systematically tested pnt-to-pnt performance
HB> between processes on a) the same die on the same
HB> socket (sdss), b) different dies on same socket
HB> (ddss) (not on Woodcrest of course) and c)
HB> different dies on different sockets (ddds). I
HB> also measured the message rate using all 4 / 8
HB> cores on the node. The pnt-to-pnt benchmarks used
HB> was ping-ping, ping-pong (Scali?s `bandwidth´ and osu_latency+osu_bandwidth).
HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6,
HB> HP MPI 2.0.2.2, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.
HB> Of these, Open MPI was the slowest for all
HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.
You are not gonna share these benchmark results with us, right? Would be very interesting to see that!
HB> Now since Open MPI 1.1.1 is quite old, I just
HB> redid the message rate measurement on an X5355
HB> (Clovertown, 2.66GHz). On an 8-byte message size,
HB> OpenMPI 1.2.2 achieves 5.5 million messages per
HB> seconds, whereas SMC 5.6.2 reaches 16.9 million
HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).
HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on
HB> ping-ping latency (usec) on an 8-byte payload yields:
HB> mapping OpenMPI SMC
HB> sdss 0.95 0.18
HB> ddss 1.18 0.12
HB> ddds 1.03 0.12
Impressive. But i never doubted that commercial MPIs are faster.
HB> So, Jan, I would be very curios to see any documentation of your claim above!
I did a benchmark of a customer application on a 8 node DualSocket DualCore Opteron cluster - unfortunately i can't remember the name.
I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0 IIRC.
I don't have the detailed data available but from my memory:
Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and mvapich the fastest.
On a single machine mpich was the worst, then mvapich and then OpenMPI - IntelMPI was the fastest.
Difference between mvapich and OpenMPI was quite big - Intel just had a small advantage over OpenMPI.
Since this was not low-level i don't know which communication pattern the Application used but it seemed to me that the shared memory configuration on OpenMPI and Intel MPI was far better than on the other two.
Cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080425/0fbfd586/attachment.html>
More information about the Beowulf
mailing list