Dolphin Wulfkit

Thu May 2 19:00:57 PDT 2002

Joachim Worringen wrote:

> To compare Wulfkit (SCI + ScaMPI) with "the other" well-known (probably
> better known) cluster interconnect (Myrinet2000 + MPICH-GM), I have put
> results of some PMB runs on a P4-i860 cluster with both, SCI and
> Myrinet, at http://www.lfbs.rwth-aachen.de/users/joachim/pmb_results .
> Please note that the i860 is quite a bad chipset for both SCI and
> Myrinet as it has disappointing PCI performance for DMA and PIO as well
> (relative to the performance potential of 64bit/66Mhz PCI). Anyway, look
> at the numbers if you are interested. I really don't want to discuss the
> relevance of such benchmarks and the type of the test platform (again);
> anybody is free to interprete them at it's own standard, requirements
> and experience. 

But Joachim, you know what happens: Scali folks put it on their 
propaganda web site and now someone has to explain how misleading it is.

First of all, yes, the i860 is a lousy chipset, nobody uses it for 
clusters (and nobody should). It has never been very clear if there was 
a bug in the prefetching and you have to change the default PCI 
registers settings to get a little bit of performance (There is a FAQ 
entry about that on the Myricom web site).

For the record, I have attached PMB results on 8 dual PIII with 
Serverworks LE (Supermicro 370DLE) and Myrinet 2000 with Lanai9 at 200 
MHz. You will see that the behaviour is very different when the PCI is 
not a crappy bottleneck. Welcome to the fun world of benchmarks, where 
there is as many results as machines on the planet...

I would like also to doubt the relevance of the SMP results: in the SCI 
case, it seems that the 2 first processes are on 2 differents nodes and 
on the Myrinet case, they are on the same node. How can you see that ? 
In the Ping-Pong test on SMP, SCI yields roughly the same bandwidth that 
for the UP test. That would mean that the i860 is able to sustain 2x160 
= 320 MB/s of traffic if both processes are on the same node and 
roundtrip via the PCI, and it would NEVER EVER happen. However, in the 
myrinet case, the bandwidth is roughly divided by 2, meaning that the 
PCI is used twice, when both processes are on the same node. A possible 
explanation would be for SCI to use shared memory for intra-nodes coms, 
but the latency would be less than 2us (1.5 us with MPICH-GM and shared 
memory enabled) and it's not the case. So please, don't compare apples 
to oranges.

The reported performance of Myrinet on the i860 is also not what I can 
get from my couple of Dual P4 i860. I cannot use them tonight but I will 
check it tommorrow. Previous runs reported definitively less than 10 us 
and more than 110 MB/s, so I don't know how Alex configured it, but it's 
not at all what I get on the same type of machine.

Finally, PMB does not give any information about the CPU footprint, and 
if you attended Tony's talk at IPDPS (Sorry Tony, but early morning is 
not a good time for me :-) ), you should understand the importance of 
the CPU overhead in real world message passing.

However, I would be very curious about PMB results with SCI on a large 
cluster, 128 or 256 nodes.

Best regards

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pmb_PIII_PCI64C_SMP.txt.gz
Type: application/gzip
Size: 12266 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20020502/d523d3c2/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pmb_PIII_PCI64C_UP.txt.gz
Type: application/gzip
Size: 9303 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20020502/d523d3c2/attachment-0001.bin>