<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title></title>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
body {
margin: 5px 5px 5px 5px;
background-color: #ffffff;
}
/* ---------- Text Styles ---------- */
hr { color: #000000}
body, table /* Normal text */
{
font-size: 9pt;
font-family: 'Courier New';
font-style: normal;
font-weight: normal;
color: #000000;
text-decoration: none;
}
span.rvts1 /* Heading */
{
font-size: 10pt;
font-family: 'Arial';
font-weight: bold;
color: #0000ff;
}
span.rvts2 /* Subheading */
{
font-size: 10pt;
font-family: 'Arial';
font-weight: bold;
color: #000080;
}
span.rvts3 /* Keywords */
{
font-size: 10pt;
font-family: 'Arial';
font-style: italic;
color: #800000;
}
a.rvts4, span.rvts4 /* Jump 1 */
{
font-size: 10pt;
font-family: 'Arial';
color: #008000;
text-decoration: underline;
}
a.rvts5, span.rvts5 /* Jump 2 */
{
font-size: 10pt;
font-family: 'Arial';
color: #008000;
text-decoration: underline;
}
span.rvts6
{
font-weight: bold;
color: #800000;
}
a.rvts7, span.rvts7
{
color: #0000ff;
text-decoration: underline;
}
span.rvts8
{
font-weight: bold;
color: #800000;
}
span.rvts9
{
font-weight: bold;
color: #800080;
}
/* ---------- Para Styles ---------- */
p,ul,ol /* Paragraph Style */
{
text-align: left;
text-indent: 0px;
padding: 0px 0px 0px 0px;
margin: 0px 0px 0px 0px;
}
.rvps1 /* Centered */
{
text-align: center;
}
--></style>
</head>
<body>
<p>Hallo Håkon,</p>
<p><br></p>
<p>Freitag, 25. April 2008, meintest Du:</p>
<p><br></p>
<p><span class=rvts6>HB> Hi Jan,</span></p>
<p><br></p>
<p><span class=rvts6>HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <</span><a class=rvts7 href="mailto:jan.heichler@gmx.net">jan.heichler@gmx.net</a><span class=rvts8>> wrote:</span></p>
<p><span class=rvts9>>> >From what i saw OpenMPI has several advantages:</span></p>
<p><br></p>
<p><span class=rvts9>>>- better performance on MultiCore Systems </span></p>
<p><span class=rvts9>>>because of good shared-memory-implementation</span></p>
<p><br></p>
<p><br></p>
<p><span class=rvts6>HB> A couple of months ago, I conducted a thorough </span></p>
<p><span class=rvts6>HB> study on intra-node performance of different MPIs </span></p>
<p><span class=rvts6>HB> on Intel Woodcrest and Clovertown systems. I </span></p>
<p><span class=rvts6>HB> systematically tested pnt-to-pnt performance </span></p>
<p><span class=rvts6>HB> between processes on a) the same die on the same </span></p>
<p><span class=rvts6>HB> socket (sdss), b) different dies on same socket </span></p>
<p><span class=rvts6>HB> (ddss) (not on Woodcrest of course) and c) </span></p>
<p><span class=rvts6>HB> different dies on different sockets (ddds). I </span></p>
<p><span class=rvts6>HB> also measured the message rate using all 4 / 8 </span></p>
<p><span class=rvts6>HB> cores on the node. The pnt-to-pnt benchmarks used </span></p>
<p><span class=rvts6>HB> was ping-ping, ping-pong (Scali’s `bandwidth´ and osu_latency+osu_bandwidth).</span></p>
<p><br></p>
<p><span class=rvts6>HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6, </span></p>
<p><span class=rvts6>HB> HP MPI 2.0.2.2, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.</span></p>
<p><br></p>
<p><span class=rvts6>HB> Of these, Open MPI was the slowest for all </span></p>
<p><span class=rvts6>HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.</span></p>
<p><br></p>
<p><br></p>
<p>You are not gonna share these benchmark results with us, right? Would be very interesting to see that!</p>
<p><br></p>
<p><span class=rvts6>HB> Now since Open MPI 1.1.1 is quite old, I just </span></p>
<p><span class=rvts6>HB> redid the message rate measurement on an X5355 </span></p>
<p><span class=rvts6>HB> (Clovertown, 2.66GHz). On an 8-byte message size, </span></p>
<p><span class=rvts6>HB> OpenMPI 1.2.2 achieves 5.5 million messages per </span></p>
<p><span class=rvts6>HB> seconds, whereas SMC 5.6.2 reaches 16.9 million </span></p>
<p><span class=rvts6>HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).</span></p>
<p><br></p>
<p><span class=rvts6>HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on </span></p>
<p><span class=rvts6>HB> ping-ping latency (usec) on an 8-byte payload yields:</span></p>
<p><br></p>
<p><span class=rvts6>HB> mapping OpenMPI SMC</span></p>
<p><span class=rvts6>HB> sdss 0.95 0.18</span></p>
<p><span class=rvts6>HB> ddss 1.18 0.12</span></p>
<p><span class=rvts6>HB> ddds 1.03 0.12</span></p>
<p><br></p>
<p>Impressive. But i never doubted that commercial MPIs are faster. </p>
<p><br></p>
<p><span class=rvts6>HB> So, Jan, I would be very curios to see any documentation of your claim above!</span></p>
<p><br></p>
<p>I did a benchmark of a customer application on a 8 node DualSocket DualCore Opteron cluster - unfortunately i can't remember the name. </p>
<p><br></p>
<p>I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0 IIRC.</p>
<p><br></p>
<p>I don't have the detailed data available but from my memory:</p>
<p><br></p>
<p>Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and mvapich the fastest. </p>
<p>On a single machine mpich was the worst, then mvapich and then OpenMPI - IntelMPI was the fastest. </p>
<p><br></p>
<p>Difference between mvapich and OpenMPI was quite big - Intel just had a small advantage over OpenMPI. </p>
<p><br></p>
<p><br></p>
<p>Since this was not low-level i don't know which communication pattern the Application used but it seemed to me that the shared memory configuration on OpenMPI and Intel MPI was far better than on the other two. </p>
<p><br></p>
<p>Cheers,</p>
<p>Jan</p>
</body></html>