[Beowulf] choosing a high-speed interconnect

Mark Hahn hahn at physics.mcmaster.ca
Tue Oct 12 22:40:03 PDT 2004


>   There are multiple 128 node (and greater) IB systems that are stable
> and are being used for production apps.  The #7 top500 machine from

I thank you for this street-level information!  it's frustrating
to only know a technology based on marketing...

> RIKEN is using IB and has been in production for over six months.  My
> cluster at Sandia (about 128 nodes) is being used for IB R&D and

still, 128 nodes is fairly small these days.  would you characterize
your applications as fairly bandwidth-intensive?  I know that many 
of the apps that run on really big weapons-related labs tend to 
emphasize latency to an extreme degree, but perhaps your codes are 
not like that?

> >300 nodes that are for production use.  All run great under Linux, and
> you have multiple IB vendors to choose from (Voltaire, Topspin,
> InfiniCon, and Mellanox).

well, aren't all of those just minor modifications of the same 
mellanox chip?  that's what I meant by "not-really-multi-vendor".
the IB world would like to compare itself to the eth world,
but it's a very, very long way away from being really vendor-independent.

> Almost all of the IB software development is
> done under Linux first and then ported to other OSes.  

very interesting!  do you mean user-level IB software and middleware?
I had the impression (circa OLS in July) that there was no real 
unification of linux IB stacks, and significant problems with 
windows-centricness of the code.

>    QP scaling isn't as critical an issue if the MPI implementation sets
> up the connections as needed (kinda of a lazy connection setup).  Why
> set up an all-to-all QP connectivity if the MPI implements an all-to-all
> or collectives as tree based pt2pt algorithms.

that sounds reasonable, but does it work out well?  I guess it would 
depend mainly on whether the actual collective groups change frequently and
are reused.

> Network congestion on
> larger clusters can be reduced by using source based adaptive
> (multipath) routing instead of the standard IB static routing.  

interesting, again!  in the most recent visit by S&M people from 
an IB vendor, they claimed that there was no problem and that any
reasonably smart switch would have a routing manager smart enough
to prevent the non-problem.

>   Also remember that IB has a lot more field experience than the latest
> Myricom hardware and MX software stack.  

to me, "recent myricom" means e-cards, which I, perhaps naively,
think are more of a known quantity than anything IB.  and I haven't
managed to lay hands on MX yet <sniff>.

I'm really glad to hear early adopters of IB speak up; I still claim
that they actually are early adopters, though ;)

regards, mark hahn.




More information about the Beowulf mailing list