[Beowulf] What is rdma, ofed, verbs, psm etc?
cap at nsc.liu.se
Tue Sep 19 09:24:22 PDT 2017
On Tue, 19 Sep 2017 09:27:55 -0600
Faraz Hussain <info at feacluster.com> wrote:
> I have never understood what these acronyms are. I've been involved
> with HPC on the applications side for many years and hear these
> terms pop up now and then. I've read through the wikipedia pages but
> still do not understand what they mean. Can someone give a very high
> level overview of what they are and how they relate to HPC?
rdma: remote direct memory access, a way to move data from one node to
another. Supported by for example Infiniband.
ofed: a software distribution of network drivers, libraries, utilities
typically used by users/applications to run on Infiniband (and other
networks supported by ofed). In short: infiniband drivers
verbs: one of many protocols supported on Infiniband providing "native
performance". For example an MPI library can use verbs to efficiently
transmit data over infiniband. Other protocols supported by Infiniband
these days include UCX, MXM, OFI, DAPL, rsockets, ...
psm: low level protocol that runs specifically on Pathscale/Truescale
class Infiniband hardware. On this hardware psm is vastly more
efficient than for example verbs. On modern Omnipath/OPA psm2 is used
in a similar fashion.
> Specifically, how does it relate to a simple hello world mpi program
> like this:
On an Infiniband cluster, when using multiple nodes, the MPI library
can use verbs or psm to send data efficiently over the network. Verbs
and psm drivers may come from an installation of ofed on these nodes
(or not if using the OS built in Infiniband support).
Different MPI implementations (such as OpenMPI, IntelMPI, MVAPICH2
etc.) makes different choices as to which protocols to use depending on
many things (including available headers at compile time etc.).
More information about the Beowulf