[Beowulf] Re: Re: Home beowulf - NIC latencies

Patrick Geoffray patrick at myri.com
Mon Feb 14 23:48:47 PST 2005

Hi Rob,

Rob Ross wrote:

> The last two Scalable OS workshops (the only two I've had a chance to 
> attend), there was a contingent of people that are certain that MPI isn't 
> going to last too much longer as a programming model for very large 

Were they advocating shared memory paradigms, one sided operations, 
something more "natural" to program with ? I heard that before :-)

> systems.  The issue, as they see it, is that MPI simply imposes too much 
> latency on communication, and because we (as MPI implementors) cannot 
> decrease that latency fast enough to keep up with processor improvements, 
> MPI will soon become too expensive to be of use on these systems.

This is just wrong. How much of the latency in high speed interconnect 
is due to MPI ? Very very little. The core of it is in the hardare (IO 
bus, NICs, crossbars and wires). Doing pure RDMA in hardware is easy for 
the chip designers, but it's hell for irregular applications when you 
actually don't know where to remotely read or write.

> Also, there is additional overhead in the Isend()/Wait() pair over the
> simple Send() (two function calls rather than one, allocation of a Request
> structure at the least) that means that a naive attempt at overlapping
> communication and computation will result in a slower application.  So
> that doesn't surprise me at all.

What is the cost of one function call and an allocation in a slab ? At 
several GHz, 50 ns ? And most of the time, blocking calls are 
implemented on top of non-blocking routines, so the CPU overhead is the 

> I think that the theme from this thread should be that "it's a good thing
> that we have more than one MPI implementation, because they all do
> different things best."

I would say having more than one MPI implementations is a bad thing as 
long as you cannot easily replace one by another. Let's define a 
standard MPI header and a standard API for spawning and such, and then 
having more than one implementation will actually be manageable. That 
would also remove the needs for swiss-army-knife MPI implementations 
that want to support all interconnect with the same binary. These 
implementations are, IMHO, a bad thing as they work at the lowest common 
denominator and are in essence inefficient for all devices.

While we are at it, here is my wish list for the next MPI specs:

a) only non-blocking calls. If there are no blocking calls, nobody will 
use them.
b) non-blocking calls for collectives too, there is no excuse. Yes, even 
an asynchronous barrier.
c) ban of the ANY_SENDER wildcard: a world of optimization goes away 
with this convenience.
d) throw away the user defined datatypes, or at least restrict it to 
regular strides.
e) get rid of one-sided communications: if someone is serious about it, 
it uses something like ARMCI or UPC or even low level vendor interfaces.

Rob, you are politically connected, could you make it happen, please ?


Patrick Geoffray
Myricom, Inc.

More information about the Beowulf mailing list