Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] choosing a high-speed interconnect

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Tue Oct 12 22:40:03 PDT 2004


>   There are multiple 128 node (and greater) IB systems that are stable
> and are being used for production apps.  The #7 top500 machine from

I thank you for this street-level information!  it's frustrating
to only know a technology based on marketing...

> RIKEN is using IB and has been in production for over six months.  My
> cluster at Sandia (about 128 nodes) is being used for IB R&D and

still, 128 nodes is fairly small these days.  would you characterize
your applications as fairly bandwidth-intensive?  I know that many 
of the apps that run on really big weapons-related labs tend to 
emphasize latency to an extreme degree, but perhaps your codes are 
not like that?

> >300 nodes that are for production use.  All run great under Linux, and
> you have multiple IB vendors to choose from (Voltaire, Topspin,
> InfiniCon, and Mellanox).

well, aren't all of those just minor modifications of the same 
mellanox chip?  that's what I meant by "not-really-multi-vendor".
the IB world would like to compare itself to the eth world,
but it's a very, very long way away from being really vendor-independent.

> Almost all of the IB software development is
> done under Linux first and then ported to other OSes.  

very interesting!  do you mean user-level IB software and middleware?
I had the impression (circa OLS in July) that there was no real 
unification of linux IB stacks, and significant problems with 
windows-centricness of the code.

>    QP scaling isn't as critical an issue if the MPI implementation sets
> up the connections as needed (kinda of a lazy connection setup).  Why
> set up an all-to-all QP connectivity if the MPI implements an all-to-all
> or collectives as tree based pt2pt algorithms.

that sounds reasonable, but does it work out well?  I guess it would 
depend mainly on whether the actual collective groups change frequently and
are reused.

> Network congestion on
> larger clusters can be reduced by using source based adaptive
> (multipath) routing instead of the standard IB static routing.  

interesting, again!  in the most recent visit by S&M people from 
an IB vendor, they claimed that there was no problem and that any
reasonably smart switch would have a routing manager smart enough
to prevent the non-problem.

>   Also remember that IB has a lot more field experience than the latest
> Myricom hardware and MX software stack.  

to me, "recent myricom" means e-cards, which I, perhaps naively,
think are more of a known quantity than anything IB.  and I haven't
managed to lay hands on MX yet <sniff>.

I'm really glad to hear early adopters of IB speak up; I still claim
that they actually are early adopters, though ;)

regards, mark hahn.




More information about the Beowulf mailing list