[Beowulf] Re: Re: Home beowulf - NIC latencies

Wed Feb 16 00:05:25 PST 2005

On Tue, Feb 15, 2005 at 09:53:37AM +0100, Joachim Worringen wrote:

> This is an interesting issue. If you look at what Greg mentioned about 
> dump NICs (like InfiniPath, or SCI) and the latency numbers Ole posted 
> for ScaMPI on different interconnects (all(?) accessed through uDAPL), 
> you see that the dumb interface SCI has the lowest latency for both, 
> pingpong and random, with random being about twice of pingpong. In 
> contrast, the "smart" NIC Myrinet, which has much less CPU utilization, 
> has twice the pingpong latency, and a slightly worse random-to-pingpong 
> ratio.

I would make 2 comments about this:

First, you should be using the best MPI for each piece of hardware.
Hardware architects pick their interface with a software
implementation in mind. I don't expect any 3rd party MPI to get close
to PathScale's MPI latency on PathScale's hardware, unless the 3rd
party is flexible enough to change a lot of code.

Second, you really can't generalize about dumb NICs by looking at
SCI. SCI has a unique situation: its raw latency is much lower than
the MPI latency of all MPI implementations for it. I suspect no
hardware designer would be out to imitate that property! Both
InfiniPath and the Quadrics STEN (forgive me for classing this as
dumb, I happen to think dumb is a compliment...) get this right.

Third (you knew I couldn't keep to my promise of 2), I wouldn't make
any scaling generalizations based on a test with 16 nodes.  Even at
128-256 nodes the picture is quite different, and that's the sweet
spot that lots of today's clusters are at. So, if you want to make a
scaling generalization, you should be quoting 256-512 node results.

-- greg