[Beowulf] IB switches: managed or not?

Frank Gruellich frank.gruellich at mapsolute.com
Tue Mar 6 02:17:26 PST 2007


Andrew Robbie (GMail) schrieb:
 > I am building a small (~16) node cluster with an IB interconnect. I need to
 > decide whether I will buy a cheaper, dumb switch and run OpenSM, or get a
 > more expensive switch with a built in subnet manager. The largest this
 > system would every grow is 32 nodes (two 24 port switches).
 > Various vendors (integrators, not switch OEMs) have stated to me that
 > managed switches are the go, and that OpenSM is (a) buggy, and (b) very
 > time consuming to set up.

It's not _that_ buggy and set up is pretty straigt forward.  But it
lacks several features you'd really like in big systems.  For fewer or
equal to 24 nodes you can go with a simple switch and OpenSM.  For 32
nodes you can use 16 nodes per switch and 8 cables for switch
interconnect.  So you should have 1/2 bisection bandwith in theory.  But
OpenSM configures IB forwarding rather static at startup and never
adjusts it to actual usage of links and is rather poor to "hotplug"
changes in topology.  So it is possible that some links are overused but
others not.  Nevertheless you can still find 24 nodes in your 32 nodes
cluster communicating nonblocking (if remaining 8 stay silent), but I
don't know a simple way to get this information from OpenSM or switch.
You can write a simple MPI program benchmarking it.

In addition the versions of OpenSM I know crash silently sometimes
(which does not affect anything), so you should monitor it in some way
(you can restart it whenever you want).  Finally I have to admit that
this are all real life experiences without any deep inside knowledge of
OpenSM or even Infiniband.

So, as a conclusion I would suggest to go with a simple 24port switch
and OpenSM for now.  If you upgrade to more than 24 nodes you should add
a more advanced switch.  From my experience you can easily mix Mellanox
switches with those formerly known as TopSpin, I don't know about other

As one more hint you should reconsider if you need that many nodes for a
job.  If you limit your need of nodes for one job to 24 you can easily
go with two dump 24 switches up to 48 nodes and both subclusters can
communicate nonblocking.  But of course this way no node of one
subcluster can communicate with one of the other one and you need a
resource management system able to assign nodes of subcluster to one

Kind regards,
Mapsolute GmbH
Frank Gruellich
Map24 Systems and Networks

Duesseldorfer Strasse 40a
65760 Eschborn

Phone: +49 6196 77756-414
Fax:   +49 6196 77756-100


More information about the Beowulf mailing list