[Beowulf] Q: IB message rate & large core counts (per node)?

Tue Feb 23 13:57:23 PST 2010

> Coalescing produces a meaningless answer from the message rate
> benchmark. Real apps don't get much of a benefit from message
> coalescing, but (if they send smallish messages) they get a big
> benefit from a good non-coalesced message rate.

in the interests of less personal/posturing/pissing, let me ask:
where does the win from coalescing come from?  I would have thought
that coalescing is mainly a way to reduce interrupts, a technique
that's familiar from ethernet interrupt mitigation, NAPI, even 
basic disk scheduling.

to me it looks like the key factor would be "propagation of desire" - 
when the app sends a message and will do nothing until the reply,
it probably doesn't make sense to coalesce that message.  otoh it's
interesting if user-level can express non-urgency as well.  my guess
is the other big thing is LogP-like parameters (gap -> piggybacking).

assuming MPI is the application-level interface, are there interesting
issues related to knowing where to deliver messages?  I don't have a 
good understanding about where things stand WRT things like QP usage
(still N*N?  is N node count or process count?) or unexpected messages.

now that I'm inventorying ignorance, I don't really understand why 
RDMA always seems to be presented as a big hardware issue.  wouldn't 
it be pretty easy to define an eth or IP-level protocol to do remote puts,
gets, even test-and-set or reduce primitives, where the interrupt handler
could twiddle registered blobs of user memory on the target side?

regards, mark hahn.