Myrinet scalability

Thu Jun 20 02:40:58 PDT 2002

Patrick Geoffray wrote:
> I think it's an interesting contribution to this thread.

Hi Patrick,

yes, it surely is!

> The respondent, Serguei, is arguing correctly but too narrowly.  The
> Scali person claims linear-cost scaling, but with a network with
> constant total capacity (i.e., bisection).  It's pretty easy to achieve
> linear cost that way ;-), but he ignores 50 years of research and
> experience in concurrent computing and networks.  Myrinet Clos networks
> scale up the network capacity with the number of nodes (full-bisection
> Clos networks), a different and far more desirable form of scaling.

Charles is of course correct here, but it should be noted that also in a
k-ary n-cube the accumulated  bandwidth of independant connections
("total capacity"), and to some degree also the capacity expressed as
bisection bandwidth, scale with the number of nodes (depending on the
ratio of host-to-network bandwidth and the placement of the processes
for this communication pattern). Of course, bisection bandwidth for Clos
scales independently from the placement of the processes which is nice.
This is relevant if this communication pattern is performed by
individual send/recv operations of the processes. If collective
operations are used, the MPI library can employ an optimized
communication pattern which makes best use of the available network
capacity. 

But "bisection bandwidth" is only one type of network capacity. The
question remains, which influence on application performance these
different characteristics of the topologies do have. There is no
application-independant answer to this question. 

Concerning the discontinuities in the scaling behaviour, Charles is
again correct, but the actual cost of the network adapter for SCI do not
scale by factor 2 for 1D -> 2D and 1.5 for 2D -> 3D, but in fact 1.14
for 1D -> 2D and 1.3 for 2D -> 3D. 

> One factor that you should not miss is that more and more of the market
> for cluster interconnect will be for *high-availability* applications.
> The k-ary n-cube is arguably the worst topology for HA, particularly for
> small N and n.  The Clos network is arguably the best topology for HA
> (due to the multiplicity of paths between hosts).

Failing switch ports in Clos networks require re-routing at the source
of a packet (current Myrinet?) or in a number of the remaining switch
ports, which do not necessarily detect the failing switch ports
themselves (as they are not directly connected with them), but will be
noted of this fact by other means (anyone more information on this?).

Failing switch ports (= nodes) in k-ary n-cube are noted immediately by
the neigbouring nodes (= switch ports) which can employ a different
routing in their routing tables. This is currently done in Scali systems
and works fine, with very little delay, and keeps applications running.

This means, HA can be more or less easily employed for both topologies.
If you use a very small number of hosts with SCI, a central switch is
preferable to an k-ary n-cube topology (as it is done for years by Sun's
HA cluster solutions). Stackable 8-port switches are currently
available.

  regards, Joachim

-- 
|  _  RWTH|  Joachim Worringen
|_|_`_    |  Lehrstuhl fuer Betriebssysteme, RWTH Aachen
  | |_)(_`|  http://www.lfbs.rwth-aachen.de/~joachim
    |_)._)|  fon: ++49-241-80.27609 fax: ++49-241-80.22339