large clusters and topologies

Steffen Persvold sp at
Mon Jul 31 07:56:11 PDT 2000

Patrick GEOFFRAY wrote:
> Steffen Persvold wrote:
> > First of all you should know that scalable SCI clusters doesen't use
> > switches, but is connected in a torus topology. This is possible because
> > the adapters can switch traffic themselves between several link
> > controllers (LC). In fact the 6-port SCI-switch is basically 6 LC's
> > connected together with the B-Link (Backside Link for SCI link
> > controllers). Thus you don't need more than one adapter on the PCI bus,
> > just plug on an addoncard (mezzanine) to the adapter, and you have a
> > torus topology instead of a single ringlet. Up to two mezzanines can be
> > connected (3D).
> That's interesting. So, if I understand well, the switch is
> onboard, the B-link plays this role. It's similar to the model of
> the ATOLL network or another network developped by a University in
> Paris (MPC interconnect) where there's a small crossbar on the NIC
> itself.
> Basically, with this model, you do need a external switch, just a
> lot of cables :-)
> The bottleneck in this case is the bandwidth of the B-link. The
> B-link is a 64bits/50 Mhz bus (400 MB/s) with a very efficient
> arbitration loop (1 cycle). It's ok with a 32 bits/33 Mhz PCI bus,
> as the B-link can sustain at least 3 times the PCI traffic. But
> with a 64/66 PCI, the B-link is not able to sustain the PCI
> bandwidth. (We have measured 500 MB/s on 64/66 PCI on a
> Pentium-based motherboard).
> > CONCLUSION: When the bandwidth provided by the SCI interconnect is
> > higher than one provided on PCI, the scalability in terms of bandwidth
> > is linear up to 1700 nodes (assuming a 3D-torus).
> And if the bandwith of the B-link is large enough to sustain 3
> times the PCI bandwitdh for a 3D-torus. With a 66/64 PCI bus, you
> can only do 1D.

> Anyway, the paper is well written, it's a good reference.
> Do you have/plan similar studies about the scalability in term of
> latency ? In case of a 3D-torus, the number of "hops" to reach a
> node at the other end of the torus can be large, so the cost to
> cross the intermediate B-links will increase linearly with the
> number of hops.
We do not have such a paper, but I think it should be fairly easy to
publish one.

> Do you know if dolphin plan to increase the bandwidth of the
> B-link, to provide a full crossbar performance for a 3D-torus
> topology with 64/66 PCI, that means at least 3x500 MB/s = 1.5 GB/s
> ?

I know that Dolphin has two types of hardware running on 64bit:
PSB64 64 bit 33MHz w/LC2, Blink speed 500 MByte/s, linkspeed 500 MByte/s

, and the new (coming very soon):
PSB66 64 bit 66MHz w/LC3, Blink speed 600 MByte/s, linkspeed 800 MByte/s

(see for more information on adapter types)

I also know that the new LC3 has support for both B-Link and BXBAR
(crossbar), but I think the PSB (PCI-SCI Bridge) only has support for
B-Link. This will truly increase scalability and thus performance. I
will ask some Dolphin guys and report back to this list.

One question to you beowulfer's; Don't you think we need this kind of
analyzis for other types of interconnects aswell before we argue about
which interconnect is the best ? I think it's dificult for a person with
no technical information on the interconnects to decide which technology
he/her is best for their system just based on other peoples opinions.

I agree that some interconnects can be difficult to analyze (due to many
vendors), but some of them should be doable (Myrinet, Giganet).

Best regards,
  Steffen Persvold               Systems Engineer
  Email : mailto:sp at     Scali AS (
  Tlf   : (+47) 22 62 89 50      Olaf Helsets vei 6
  Fax   : (+47) 22 62 89 51      N-0621 Oslo, Norway

More information about the Beowulf mailing list