[Beowulf] choosing a high-speed interconnect
landman at scalableinformatics.com
Tue Oct 12 22:12:31 PDT 2004
Good to see you here ... :)
Matt L. Leininger wrote:
> There are multiple 128 node (and greater) IB systems that are stable
>and are being used for production apps. The #7 top500 machine from
>RIKEN is using IB and has been in production for over six months. My
>cluster at Sandia (about 128 nodes) is being used for IB R&D and
FWIW I used the nice setup that the AMD Dev center team have set up for
benchmarking and testing. They have a nice IB platform there.
> QP scaling isn't as critical an issue if the MPI implementation sets
>up the connections as needed (kinda of a lazy connection setup). Why
>set up an all-to-all QP connectivity if the MPI implements an all-to-all
>or collectives as tree based pt2pt algorithms. Network congestion on
>larger clusters can be reduced by using source based adaptive
>(multipath) routing instead of the standard IB static routing.
On features utility ... (qp scaling, ...) (more to Mark than Matt here)
One of the things I remember as a "feature" much touted by the
marketeers in the ccNUMA 6.5 IRIX days was page migration. This feature
was supposed to ameliorate memory access hotspots in parallel codes.
Enough hits on a page from a remote CPU, and whammo, off it went to the
Turns out this was "A Bad Thing(TM)". There were many reasons for this,
but in the end, page migration was little more than a marginal feature,
best used in specific corner cases. Sure, someone will speak up and
tell me how much pain it saved them, or made their code 3 orders of
magnitude faster. I never saw that in general. I got better results
from dplace, and large pages than I ever got from some of these other
The point is that there are often lots of features. Some of which might
even be generally useful. Others might simply not be useful as the
application level issues might be better served by other methods (as you
IB works pretty nicely on clusters. So do many of the other
interconnects. If you have latency bound or bandwidth bound problems,
certainly it would be worth looking into.
The original question was which to look at. First the need needs to be
assessed, and from there, a reasonable comparison may be made. IB does
look like it is drawing wide support right now, and is not single
sourced. It may be possible (though I haven't done much in the way of
measurement) that tcp offload systems might help as well. If you are
not extremely sensitive to latency, you might be able to use these. If
you are, you should stick to the low latency fabrics.
> Also remember that IB has a lot more field experience than the latest
>Myricom hardware and MX software stack.
More information about the Beowulf