large clusters and topologies

Patrick GEOFFRAY pgeoffra at lhpca.univ-lyon1.fr
Sat Jul 29 06:53:14 PDT 2000


Steffen Persvold wrote:

> First of all you should know that scalable SCI clusters doesen't use
> switches, but is connected in a torus topology. This is possible because
> the adapters can switch traffic themselves between several link
> controllers (LC). In fact the 6-port SCI-switch is basically 6 LC's
> connected together with the B-Link (Backside Link for SCI link
> controllers). Thus you don't need more than one adapter on the PCI bus,
> just plug on an addoncard (mezzanine) to the adapter, and you have a
> torus topology instead of a single ringlet. Up to two mezzanines can be
> connected (3D).

That's interesting. So, if I understand well, the switch is
onboard, the B-link plays this role. It's similar to the model of
the ATOLL network or another network developped by a University in
Paris (MPC interconnect) where there's a small crossbar on the NIC
itself.
Basically, with this model, you do need a external switch, just a
lot of cables :-)

The bottleneck in this case is the bandwidth of the B-link. The
B-link is a 64bits/50 Mhz bus (400 MB/s) with a very efficient
arbitration loop (1 cycle). It's ok with a 32 bits/33 Mhz PCI bus,
as the B-link can sustain at least 3 times the PCI traffic. But
with a 64/66 PCI, the B-link is not able to sustain the PCI
bandwidth. (We have measured 500 MB/s on 64/66 PCI on a
Pentium-based motherboard).

> CONCLUSION: When the bandwidth provided by the SCI interconnect is
> higher than one provided on PCI, the scalability in terms of bandwidth
> is linear up to 1700 nodes (assuming a 3D-torus).

And if the bandwith of the B-link is large enough to sustain 3
times the PCI bandwitdh for a 3D-torus. With a 66/64 PCI bus, you
can only do 1D.

Anyway, the paper is well written, it's a good reference. 
Do you have/plan similar studies about the scalability in term of
latency ? In case of a 3D-torus, the number of "hops" to reach a
node at the other end of the torus can be large, so the cost to
cross the intermediate B-links will increase linearly with the
number of hops.
Do you know if dolphin plan to increase the bandwidth of the
B-link, to provide a full crossbar performance for a 3D-torus
topology with 64/66 PCI, that means at least 3x500 MB/s = 1.5 GB/s
?

> Finally; if anyone feels offended by getting this information, I am
> sorry.

This is a mailing list, we are here to share information. Don't be
sorry :-)
Nobody will be offended if we talk about technical points (at
least not me).


Patrick Geoffray
---
Aerospatiale Matra - Sycomore
Universite Lyon I - RESAM
http://lhpca.univ-lyon1.fr




More information about the Beowulf mailing list