[Beowulf] choosing a high-speed interconnect
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Oct 12 22:12:31 PDT 2004
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Matt: Good to see you here ... :) Matt L. Leininger wrote: > > > > There are multiple 128 node (and greater) IB systems that are stable >and are being used for production apps. The #7 top500 machine from >RIKEN is using IB and has been in production for over six months. My >cluster at Sandia (about 128 nodes) is being used for IB R&D and > > FWIW I used the nice setup that the AMD Dev center team have set up for benchmarking and testing. They have a nice IB platform there. [...] > QP scaling isn't as critical an issue if the MPI implementation sets >up the connections as needed (kinda of a lazy connection setup). Why >set up an all-to-all QP connectivity if the MPI implements an all-to-all >or collectives as tree based pt2pt algorithms. Network congestion on >larger clusters can be reduced by using source based adaptive >(multipath) routing instead of the standard IB static routing. > > On features utility ... (qp scaling, ...) (more to Mark than Matt here) One of the things I remember as a "feature" much touted by the marketeers in the ccNUMA 6.5 IRIX days was page migration. This feature was supposed to ameliorate memory access hotspots in parallel codes. Enough hits on a page from a remote CPU, and whammo, off it went to the remote CPU. Turns out this was "A Bad Thing(TM)". There were many reasons for this, but in the end, page migration was little more than a marginal feature, best used in specific corner cases. Sure, someone will speak up and tell me how much pain it saved them, or made their code 3 orders of magnitude faster. I never saw that in general. I got better results from dplace, and large pages than I ever got from some of these other features. The point is that there are often lots of features. Some of which might even be generally useful. Others might simply not be useful as the application level issues might be better served by other methods (as you pointed out). IB works pretty nicely on clusters. So do many of the other interconnects. If you have latency bound or bandwidth bound problems, certainly it would be worth looking into. The original question was which to look at. First the need needs to be assessed, and from there, a reasonable comparison may be made. IB does look like it is drawing wide support right now, and is not single sourced. It may be possible (though I haven't done much in the way of measurement) that tcp offload systems might help as well. If you are not extremely sensitive to latency, you might be able to use these. If you are, you should stick to the low latency fabrics. > Also remember that IB has a lot more field experience than the latest >Myricom hardware and MX software stack. > > Joe
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
