[Beowulf] Re: Re: Home beowulf - NIC latencies

Tue Feb 15 06:51:34 PST 2005

> Throw away compatibility. If you keep the legacy API, you have no 
> incentive for change. I don't want MPI-3, I want MPI-light. We are 
> against a wall because the MPI spec was too rich and developers took the 
> lazy path.

Inertia is a powerful thing.  Billions of dollars have been invested in
MPI codes.  Changing that will not be easy (or cheap).  This is not as
simple as moving from vectors to distributed memory - there wasn't
nearly as much accumulated code then (and, it hurt back then). 

> It's used because it's there, there is no other reason. If you don't 
> know who sends you what in a message passing application, then you 
> cannot get either performance or robustness. If really you cannot do 
> otherwise (and I don't believe that), you can always use unexpected 
> messages (post the receive after Probe()ing), That's ugly, but you get 
> what you deserved :-)

That just isn't true.  If I don't know how many messages I will get, or
from whom, but I can bound it, then I should prepost those receives. 
This is particularly true in your standard physics code that runs for
days and does thousands of time steps. (i.e. you can maintain a circular
queue of these things).

> If you don't use user-defined datatypes, then you don't need it and it 
> should not be there in the first place. It's a temptation, it's too 
> easy. No, there is no ways to implement them efficiently unless they are 
> regular, and this is what I am willing to keep: strided types with long 
> segments. Everything else leads to memory copies. The developer should 
> wipe his own bottom instead of asking the message passing interface to 
> work around bad data layout. Sending a column of blocs, yes, that's 
> regular stride and it makes a lot of sense. Sending non-contiguous 
> irregular structure ? As we used to say in France, $100 and a chocolate 
> bar with that ?

The user should always expose as much opportunity for optimization as
possible to the MPI layer.  e.g. a load-store architecture like the X1
(not what I am advocating for MPI performance, mind you) could do
excellent datatype processing.  You would rather the user do the
gather/scatter themselves to prohibit the MPI from being able to do it?
Not that anyone uses irregular MPI datatypes because they were so bad
for so long...  but it would be nice if that were exposed to MPI.

					Keith