[Beowulf] Infiniband modular switches
Gilad Shainer
Shainer at mellanox.com
Thu Jun 26 17:16:52 PDT 2008
Patrick Geoffray wrote:
> > There are cases where adaptive routing will show a benefit, and
> this is why
> > we see the IB vendors add adaptive routing support as well. But in
> > general, the average effective bandwidth is much much
> higher than the
> > 40% you claim.
>
> Have a look at the slides 17 and 19 of the following set of
> slides (and slides 21 and 22 to illustrate my point above):
> http://www.openib.org/archives/spring2007sonoma/Monday%20April
%2030/Leininger-Seager-Adaptive-Routing-OFA-Sonoma-2007-v03.pdf
>
Not only that I was there, but also had conversations afterwards. It is
a really "fair" comparison when you have different injection
rate/network capacity parameters. You can also take 10Mb and inject it
into 10Gb/s network to show the same, and you always can create the
network pattern to show what you want to show, but you prove nothing
here. I am not favor of static routing only or adaptive routing only,
and having both options is the most flexible solution.
> Hoefler and al have shown an average effective bisection of
> ~40% on Infiniband (OMNeT simulations) in a paper submitted
> to Cluster2008. In a paper to be presented at Hot
> Interconnects this year, I have measured the effective
> bisection (SendRecv on random pairs) on a 512-node Myri-10G
> cluster (single enclosure, 32-port crossbars) under various
> routing implementations. Below is the link to pretty graphs
> with static and probing adaptive routing:
> http://patrick.geoffray.googlepages.com/staticvsadaptiverouting
>
> You can see that the worst case static routing goes quickly
> below 40%, but the average eventually goes there as well.
>
So what is your proof point here? I am sure you will find many cases
that static routing will do better (definitely on other interconnects)
and cases for adaptive routing.
> > There are some vendors that uses only the 24 port switches to build
> > very large scale clusters - 3000 nodes and above, without any
> > oversubscription, and they find it more cost effective.
> Using single
> > enclosures is easier, but the cables are not expensive and
> you can use
>
> Price of cables usually depends on the length (copper and
> fiber). Using small switches at the edges allows to use very
> short cables to the hosts
> (in-rack) but you still have to use the same number of longer
> cables to connect to the spine. With a single enclosure, you
> may need longer cables to reach the hosts (different rack),
> but you don't need cables to the spine as they are on the
> switch backplane (and PCB is free). Short cables may not be
> expensive, but they are not free. Furthermore, physical
> cables are much less reliable than wire on PCB, and they take
> more space, more power.
>
Again, case by case. You can build large cluster with very short cables.
Some vendors find it better and some preferred to use large switches -
the largest one is the 3456 port switch from Sun - used in the #4 on the
Top500 (TACC)
More information about the Beowulf
mailing list