[Beowulf] Re: Re: Home beowulf - NIC latencies

Fri Feb 11 14:11:14 PST 2005

Greg Lindahl wrote:

>On Fri, Feb 11, 2005 at 02:39:29PM -0600, Isaac Dooley wrote:
>
>  
>
>>Using MPI_ISend() allows programs to not waste 
>>CPU cycles waiting on the completion of a message transaction. This is 
>>critical for some tightly coupled fine grained applications.
>>    
>>
>
>We do pretty much the same thing for MPI_Send and MPI_ISend for small
>packets: they're nearly on the wire when the routine returns, and
>the subsequent MPI_Wait is a no-op. This is actually pretty common
>among MPI implementations.
>
>The problem with trying to generalize about what MPI calls do is that
>different implementations do different things with them. Reading
>the standard won't teach you much about implementations.
>
>-- greg
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>
Right. Small messages are where latency matters anyway. As the message size
dwindles, the remaining overhead is mostly intrinsic to the subroutine call
and unavoidable. What is to be done?  The only choice is to squeeze out the
subroutine call itself with a different programming model (say UPC) and a
memory and instruction set architecture that supports single instruction
(preferably pipeline with a block/vector length and stride option to hide 
latency) remote memory addressing. Additions like the STEN on the Quadrics Elan4
and Hypertransport directly from remote processor cache are cluster hardware
morphs taking things the direction of GAS systems like the Cray X1 and SGI
Altix.  

rbw