[Beowulf] New HPCC results, and an MX question
Greg Lindahl
lindahl at pathscale.com
Tue Jul 19 17:55:55 PDT 2005
First off, I'd like to announce that we've started publishing public
benchmark data for InfiniPath; for example, we've now got a data point
listed at the HPC Challenge website:
http://icl.cs.utk.edu/hpcc/hpcc_results.cgi
In particular I'd like to point out our "Random Ring Latency" number
of 1.31 usec. This benchmark is a lot more realistic than the usual
ping-pong latency, because it uses all the cpus on all the nodes,
instead of just 1 cpu on each of 2 nodes. If you examine other
interconnects, you'll note that many of them get a much worse random
ring latency than ordinary ping-pong.
Second, I have a question about Myrinet MX performance. Myricom has
better things to do than answer my performance queries (no surprise,
every company prefers to answer customer queries first). With GM,
Myricom published the raw output from the Pallas benchmark, and that
was very useful for doing comparisons. With MX, Myricom hasn't
published the raw data, but they did publish graphs. The claimed
0-byte latency is 2.6 usec, with no explanation of what benchmark was
used. The graph at:
http://www.myri.com/myrinet/performance/MPICH-MX/
for Pallas pingpong latency is a log/log scale, so it's hard to see
what latency it got without having the detailed results, which are not
provided. But if you look at the bandwidth chart, it's semi-log. So at
32 byte payloads, the bandwidth looks to me like it's 9 or 10
MB/s. That corresponds to a 3.1 to 3.4 usec 0-byte bandwidth. The
bandwidth for 64 bytes and 128 bytes seem to support this number, too.
So, the question is, am I full of it? Wait, don't answer that! The
question is, can someone using MX please run Pallas pingpong and
publish the raw chart?
To be fair, we don't have these details for InfiniPath up on our
website yet, so here's what we get on our 2.6 Ghz dual-cpu systems.
We're about 30 nanoseconds slower on this pingpong than the number
we get from the osu_latency pingpong.
-- greg
#---------------------------------------------------
# Benchmarking PingPong
# ( #processes = 2 )
# ( 30 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.35 0.00
1 1000 1.36 0.70
2 1000 1.36 1.41
4 1000 1.34 2.85
8 1000 1.35 5.66
16 1000 1.59 9.58
32 1000 1.63 18.75
64 1000 1.68 36.38
128 1000 1.79 68.20
256 1000 2.04 119.47
512 1000 2.53 192.73
1024 1000 3.51 277.86
2048 1000 5.57 350.71
4096 1000 7.46 523.45
8192 1000 11.70 668.02
16384 1000 21.49 727.14
32768 1000 42.89 728.55
65536 640 88.76 704.17
131072 320 161.42 774.36
262144 160 308.38 810.68
524288 80 582.13 858.92
1048576 40 1146.71 872.06
2097152 20 2253.23 887.62
4194304 10 4452.19 898.43
More information about the Beowulf
mailing list