[Beowulf] choosing a high-speed interconnect
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caTue Oct 12 22:40:03 PDT 2004
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> There are multiple 128 node (and greater) IB systems that are stable > and are being used for production apps. The #7 top500 machine from I thank you for this street-level information! it's frustrating to only know a technology based on marketing... > RIKEN is using IB and has been in production for over six months. My > cluster at Sandia (about 128 nodes) is being used for IB R&D and still, 128 nodes is fairly small these days. would you characterize your applications as fairly bandwidth-intensive? I know that many of the apps that run on really big weapons-related labs tend to emphasize latency to an extreme degree, but perhaps your codes are not like that? > >300 nodes that are for production use. All run great under Linux, and > you have multiple IB vendors to choose from (Voltaire, Topspin, > InfiniCon, and Mellanox). well, aren't all of those just minor modifications of the same mellanox chip? that's what I meant by "not-really-multi-vendor". the IB world would like to compare itself to the eth world, but it's a very, very long way away from being really vendor-independent. > Almost all of the IB software development is > done under Linux first and then ported to other OSes. very interesting! do you mean user-level IB software and middleware? I had the impression (circa OLS in July) that there was no real unification of linux IB stacks, and significant problems with windows-centricness of the code. > QP scaling isn't as critical an issue if the MPI implementation sets > up the connections as needed (kinda of a lazy connection setup). Why > set up an all-to-all QP connectivity if the MPI implements an all-to-all > or collectives as tree based pt2pt algorithms. that sounds reasonable, but does it work out well? I guess it would depend mainly on whether the actual collective groups change frequently and are reused. > Network congestion on > larger clusters can be reduced by using source based adaptive > (multipath) routing instead of the standard IB static routing. interesting, again! in the most recent visit by S&M people from an IB vendor, they claimed that there was no problem and that any reasonably smart switch would have a routing manager smart enough to prevent the non-problem. > Also remember that IB has a lot more field experience than the latest > Myricom hardware and MX software stack. to me, "recent myricom" means e-cards, which I, perhaps naively, think are more of a known quantity than anything IB. and I haven't managed to lay hands on MX yet <sniff>. I'm really glad to hear early adopters of IB speak up; I still claim that they actually are early adopters, though ;) regards, mark hahn.
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
