[Beowulf] New HPCC results, and an MX question

Tue Jul 19 19:12:22 PDT 2005

Greg Lindahl wrote:
> interconnects, you'll note that many of them get a much worse random
> ring latency than ordinary ping-pong.

Nope. It's worse because:
* they use much larger clusters: when the size of the cluster increases, 
the number of hops increases, thus the worst case latency increases. 16 
nodes is a tiny cluster with just one hop worst case.
* they use older hardware: 2.6 GHz Opterons are not very old.
* they use older drivers: because customers have other things to do that 
running benchmark on carefuly crafted environment with carefuly 
optimized driver/lib.

> Second, I have a question about Myrinet MX performance. Myricom has
> better things to do than answer my performance queries (no surprise,

No. Myricom answered your query, not the way you wanted but we replied 
to you (same day).

> every company prefers to answer customer queries first).  With GM,
> Myricom published the raw output from the Pallas benchmark, and that
> was very useful for doing comparisons. With MX, Myricom hasn't

That was very useful for competitors to publish bogus data comparaison 
on differents configuration (different hardware, different software). 
That's why we stopped to publish the raw data.

By the way, could you point me to the raw performance data on the 
pathscale web pages ?

> published the raw data, but they did publish graphs. The claimed
> 0-byte latency is 2.6 usec, with no explanation of what benchmark was
> used. The graph at:

 From the page: "Performance data is presented for the Pallas MPI 
Benchmark Suite, Version 2.2". It's in bold, but maybe we should write 
in red, blinking...

> MB/s. That corresponds to a 3.1 to 3.4 usec 0-byte bandwidth. The
> bandwidth for 64 bytes and 128 bytes seem to support this number, too.
> 
> So, the question is, am I full of it? Wait, don't answer that! The

Full of what ? I can think about a few things...

Anyway, the cluster I ran Pallas on had a 0-byte MPI latency of 2.9 us. 
Why ? Because it's a production cluster, deployed over a year ago, with 
1.4 GHz Opteron CPUs (compare that with your 2.6 GHz).

> question is, can someone using MX please run Pallas pingpong and
> publish the raw chart?

And please don't forget to turn write combining (WC) on.

Patrick
-- 

Patrick Geoffray
Myricom, Inc.
http://www.myri.com