[Beowulf] Performance characterising a HPC application
Scott Atchley
atchley at myri.com
Fri Mar 30 18:31:29 PDT 2007
On Mar 26, 2007, at 1:04 PM, Gilad Shainer wrote:
> When Mellanox refers to transport offload, it mean full transport
> offload - for all transport semantics. InfiniBand, as you probably
> know, provides RDMA AND Send/Receive semantics, and in both cases
> you can do Zero-copy operations.
>
> This full flexibility provides the programmer with the ability to
> choose
> the
> best semantics for his use. Some programmers choose Send/Receive and
> some RDMA. It is all depends on their application.
> From your response, I see that Qlogic does not provide this kind
> of flexibility.
Gilad,
I have seen you make that point many times. This may be a silly
question, but it latency and throughput equivalent for both APIs for
large and small messages?
I ask because I wrote the ports of Lustre and PVFS2 for MX and I
spent a lot of time looking at their existing IB code. I see them use
Send/Recv for small and/or unexpected messages. Both use IB write for
large payloads.
Although both use IB write (one-sided, no?) for the large payload,
both require one or two small Send/Recv messages to serve as RTS and
CTS before they can initiate the one-sided implementation. In effect,
they have to write their own Send/Recv (two-sided) semantics on of
IB's RDMA.
If Send/Recv performance is on par with RDMA on IB, why not use that
API for large messages? Why re-write Send/Recv every time they use
RDMA? The code to implement PVFS2 on MX is over 30% smaller than the
IB code because I did not have to re-write Send/Recv.
Scott
More information about the Beowulf
mailing list