[Beowulf] MPICH vs. OpenMPI

Fri Apr 25 05:04:38 PDT 2008

Hallo Håkon,

Freitag, 25. April 2008, meintest Du:

HB> Hi Jan,

HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <jan.heichler at gmx.net> wrote:
>> >From what i saw OpenMPI has several advantages:

>>- better performance on MultiCore Systems 
>>because of good shared-memory-implementation

HB> A couple of months ago, I conducted a thorough 
HB> study on intra-node performance of different MPIs 
HB> on Intel Woodcrest and Clovertown systems. I 
HB> systematically tested pnt-to-pnt performance 
HB> between processes on a) the same die on the same 
HB> socket (sdss), b) different dies on same socket 
HB> (ddss) (not on Woodcrest of course) and c) 
HB> different dies on different sockets (ddds). I 
HB> also measured the message rate using all 4 / 8 
HB> cores on the node. The pnt-to-pnt benchmarks used 
HB> was ping-ping, ping-pong (Scali?s `bandwidth´ and osu_latency+osu_bandwidth).

HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6, 
HB> HP MPI 2.0.2.2, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.

HB> Of these, Open MPI was the slowest for all 
HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.

You are not gonna share these benchmark results with us, right? Would be very interesting to see that!

HB> Now since Open MPI 1.1.1 is quite old, I just 
HB> redid the message rate measurement on an X5355 
HB> (Clovertown, 2.66GHz). On an 8-byte message size, 
HB> OpenMPI 1.2.2 achieves 5.5 million messages per 
HB> seconds, whereas SMC 5.6.2 reaches 16.9 million 
HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).

HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on 
HB> ping-ping latency (usec) on an 8-byte payload yields:

HB> mapping OpenMPI   SMC
HB> sdss       0.95  0.18
HB> ddss       1.18  0.12
HB> ddds       1.03  0.12

Impressive. But i never doubted that commercial MPIs are faster. 

HB> So, Jan, I would be very curios to see any documentation of your claim above!

I did a benchmark of a customer application on a 8 node DualSocket DualCore Opteron cluster - unfortunately i can't remember the name. 

I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0 IIRC.

I don't have the detailed data available but from my memory:

Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and mvapich the fastest. 
On a single machine mpich was the worst, then mvapich and then OpenMPI - IntelMPI was the fastest. 

Difference between mvapich and OpenMPI was quite big - Intel just had a small advantage over OpenMPI. 

Since this was not low-level i don't know which communication pattern the Application used but it seemed to me that the shared memory configuration on OpenMPI and Intel MPI was far better than on the other two. 

Cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080425/0fbfd586/attachment.html>