[Beowulf] Can one Infiniband net support MPI and a parallel file system?

Jason Clinton jclinton at advancedclustering.com
Wed Aug 6 11:31:09 PDT 2008

On Tue, Aug 5, 2008 at 4:25 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> Is anybody using Infiniband to provide both
> MPI connection and parallel file system services on a Beowulf cluster?
> I thought to have a storage node that would
> serve a parallel file system to the beowulf nodes over IB
> (something like a NFS on steroids).
> The same IB net would also work as the MPI interconnect.
> Is this design possible?

We have customers doing Lustre and MPI with IB successfully. They
still have a good-old gigabit management network to fall back on: it
makes sense to keep this around because gigabit is so low-cost by
comparison and it's rock-solid. But, you should know that you need
more than a single node to provide disk I/O before you start to see
the performance benefit. I/O from a single node can--generally--barely
fill a gigabit link. To exceed that gigabit level of performance,
you'd need more than one storage node delivering storage to the Lustre

> On a small cluster, does it require two separate IB physical networks (cards
> and switch),
> or can it be done with a single IB card per node and one switch?

It can be done with a single IB network.

> Is this design efficient?

Generally speaking, MPI programs will not be fetching/writing data
from/to storage at the same time they are doing MPI calls so there
tends to not be very much contention to worry about at the node level.

> Are there other practical and  cost effective alternatives to this idea?

If the cluster is small enough, using gigabit with a shared filesystem
is preferred since IB's low latency has relatively little affect on
the big source of latency in any storage system: the physical disks.
It's not until you cross the gigabit bandwidth barrier that IB really
starts to make sense--and that's a barrier that's not crossed that
often in a small cluster.

> Would this type of design work with GigE instead of IB?

Yes, but you'd still want IB for low latency MPI traffic.

> I confess I know nothing about parallel file systems and IB.
> So, please forgive me if my questions are nonsense.

Lustre and Panassas are certainly both stable options in this area.

Jason D. Clinton
Advanced Clustering Technologies, Inc.

More information about the Beowulf mailing list