[Beowulf] Re: torus versus (fat) tree topologies
Isaac Dooley
idooley at isaacdooley.com
Mon Nov 8 14:23:08 PST 2004
Note that the large BlueGene/L machine(probably the fastest in the world) uses a couple different networks. IBM's paper "An Overview of the BlueGene/L Supercomputer" describes them: "The nodes are interconnected through five networks: a 3D torus network for point-topoint messaging between compute nodes, a global combining/broadcast tree for collective operations such as MPI_Allreduce over the entire application, a global barrier and interrupt network, a Gigabit Ethernet to JTAG network for machine control, and another Gigabit Ethernet network for connection to other systems, such as hosts and file systems..."
The main communication one is a 3-D Torus. Torus topologies do eliminate expensive switches, since nodes connect directly to other nodes, however there may be issues with latency. A message must be sent to one node and then forwarded, which is slow if software is involved, but may be supported on the NIC(as is done in BG/L).
The fat tree networks allow high bisection bandwidths, along with large numbers of contention free "virtual" paths, all using dedicated(maybe line speed) hardware which gives low latency. One of the hardest parts of HPC is the latency that kills many fine grained parallel applications. Also using a separate network allows nodes to die, without breaking the network(or its routing algorithms). And most people are more familiar with tree style networks like ethernet, so it is easy to adapt to fat-tree.
One further thing to note is Fat Tree's give multiple contention free paths between nodes whereas a normal tree only has a single path between any two nodes. So that is one reason why Fat Tree's are used in big machines.
Isaac Dooley
Parallel Programming Lab, UIUC
>My investigation thus far has led me to believe that one reason a torus
>topology might be better is because it eliminates the need for a switch.
> On the other hand fat tree interconnects seem to dominate the
>larger(est) clusters out there, why?
More information about the Beowulf
mailing list