[Beowulf] Infiniband: MPI and I/O?

Mark Hahn hahn at mcmaster.ca
Thu May 26 09:18:18 PDT 2011


> Wondering if anyone out there is doing both I/O to storage as well as
> MPI over the same IB fabric.

I would say that is the norm.  we certainly connect local storage 
(Lustre) to nodes via the same fabric as MPI.  gigabit is completely
inadequate for modern nodes, so the only alternatives would be 10G
or a secondary IB fabric, both quite expensive propositions, no?

I suppose if your cluster does nothing but IO-light serial/EP jobs,
you might think differently.

> Following along in the Mellanox User's
> Guide, I see a section on how to implement the QOS for both MPI and my
> lustre storage.  I am curious though as to what might happen to the
> performance of the MPI traffic when high I/O loads are placed on the
> storage.

to me, the real question is whether your IB fabric is reasonably close 
to full-bisection (and/or whether your storage nodes are sensibly placed,
topologically.)

> In our current implementation, we are using blades which are 50%
> blocking (2:1 oversubscribed) when moving from a 16 blade chassis to
> other nodes.  Would trying to do storage on top dictate moving to a
> totally non-blocking fabric?

how much inter-chassis MPI do you do?  how much IO do you do?
IB has a small MTU, so I don't really see why mixed traffic would 
be a big problem.  of course, IB also doesn't do all that wonderfully
with hotspots.  but isn't this mostly an empirical question you can
answer by direct measurement?

regards, mark hahn.



More information about the Beowulf mailing list