[Beowulf] Re: Re: Home beowulf - NIC latencies

Joachim Worringen joachim at ccrl-nece.de
Tue Feb 15 00:53:37 PST 2005

Ashley Pittman wrote:
  > If you had a bunch of sends to do to N remote processes then I'd expect
> you to post them in order (non-blocking) and wait for them all at the
> end, the time taken to do this should be (base_latency + ( (N-1) * M ))
> where M is the recpipiocal of the "issue rate".  You can clearly see
> here that even for small number of batched sends (even a 2d/3d nearest
> neighbour matrix) the issue rate (that is how little CPU the send call
> consumes) is at least as important that the raw latency.

This is an interesting issue. If you look at what Greg mentioned about 
dump NICs (like InfiniPath, or SCI) and the latency numbers Ole posted 
for ScaMPI on different interconnects (all(?) accessed through uDAPL), 
you see that the dumb interface SCI has the lowest latency for both, 
pingpong and random, with random being about twice of pingpong. In 
contrast, the "smart" NIC Myrinet, which has much less CPU utilization, 
has twice the pingpong latency, and a slightly worse random-to-pingpong 

Why this? Maybe better pipelining in SCI, because it's write-and-forget 
for the CPU, with 16 outstanding transactions on the network level, 
while Myrinet obviously behaves differently here (although GM should 
also be PIO-write to the NIC memory for small messages).

Then there is Infiniband, which has a much better random-to-pingpong 
ratio, which is striking.

Would be nice to see Quadrics or InfiniPath in this context.


Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

More information about the Beowulf mailing list