[Beowulf] Performance characterising a HPC application

Fri Mar 30 18:31:29 PDT 2007

On Mar 26, 2007, at 1:04 PM, Gilad Shainer wrote:

> When Mellanox refers to transport offload, it mean full transport
> offload - for all transport semantics. InfiniBand, as you probably
> know, provides RDMA AND Send/Receive semantics, and in both cases
> you can do Zero-copy operations.
>
> This full flexibility provides the programmer with the ability to  
> choose
> the
> best semantics for his use. Some programmers choose Send/Receive and
> some RDMA. It is all depends on their application.
> From your response, I see that Qlogic does not provide this kind
> of flexibility.

Gilad,

I have seen you make that point many times. This may be a silly  
question, but it latency and throughput equivalent for both APIs for  
large and small messages?

I ask because I wrote the ports of Lustre and PVFS2 for MX and I  
spent a lot of time looking at their existing IB code. I see them use  
Send/Recv for small and/or unexpected messages. Both use IB write for  
large payloads.

Although both use IB write (one-sided, no?) for the large payload,  
both require one or two small Send/Recv messages to serve as RTS and  
CTS before they can initiate the one-sided implementation. In effect,  
they have to write their own Send/Recv (two-sided) semantics on of  
IB's RDMA.

If Send/Recv performance is on par with RDMA on IB, why not use that  
API for large messages? Why re-write Send/Recv every time they use  
RDMA? The code to implement PVFS2 on MX is over 30% smaller than the  
IB code because I did not have to re-write Send/Recv.

Scott