[Beowulf] Re: Re: Home beowulf - NIC latencies
patrick at myri.com
Mon Feb 14 23:48:47 PST 2005
Rob Ross wrote:
> The last two Scalable OS workshops (the only two I've had a chance to
> attend), there was a contingent of people that are certain that MPI isn't
> going to last too much longer as a programming model for very large
Were they advocating shared memory paradigms, one sided operations,
something more "natural" to program with ? I heard that before :-)
> systems. The issue, as they see it, is that MPI simply imposes too much
> latency on communication, and because we (as MPI implementors) cannot
> decrease that latency fast enough to keep up with processor improvements,
> MPI will soon become too expensive to be of use on these systems.
This is just wrong. How much of the latency in high speed interconnect
is due to MPI ? Very very little. The core of it is in the hardare (IO
bus, NICs, crossbars and wires). Doing pure RDMA in hardware is easy for
the chip designers, but it's hell for irregular applications when you
actually don't know where to remotely read or write.
> Also, there is additional overhead in the Isend()/Wait() pair over the
> simple Send() (two function calls rather than one, allocation of a Request
> structure at the least) that means that a naive attempt at overlapping
> communication and computation will result in a slower application. So
> that doesn't surprise me at all.
What is the cost of one function call and an allocation in a slab ? At
several GHz, 50 ns ? And most of the time, blocking calls are
implemented on top of non-blocking routines, so the CPU overhead is the
> I think that the theme from this thread should be that "it's a good thing
> that we have more than one MPI implementation, because they all do
> different things best."
I would say having more than one MPI implementations is a bad thing as
long as you cannot easily replace one by another. Let's define a
standard MPI header and a standard API for spawning and such, and then
having more than one implementation will actually be manageable. That
would also remove the needs for swiss-army-knife MPI implementations
that want to support all interconnect with the same binary. These
implementations are, IMHO, a bad thing as they work at the lowest common
denominator and are in essence inefficient for all devices.
While we are at it, here is my wish list for the next MPI specs:
a) only non-blocking calls. If there are no blocking calls, nobody will
b) non-blocking calls for collectives too, there is no excuse. Yes, even
an asynchronous barrier.
c) ban of the ANY_SENDER wildcard: a world of optimization goes away
with this convenience.
d) throw away the user defined datatypes, or at least restrict it to
e) get rid of one-sided communications: if someone is serious about it,
it uses something like ARMCI or UPC or even low level vendor interfaces.
Rob, you are politically connected, could you make it happen, please ?
More information about the Beowulf