[Beowulf] torus versus (fat) tree topologies

daniel.kidger at quadrics.com daniel.kidger at quadrics.com
Wed Nov 17 11:32:17 PST 2004


yup,
  Just to add a little to Federica's reply...

The 25ns Craig quotes is an upper bound for QsNetII (elan4/elite4). 21ns is more typical.  
(QsNetI (elan3/elite3) was a max of 40ns iirc.)

The Hotchips paper shows the breakdown of latency for an 8byte write on a 4000 node cluster.
Quadrics uses a fat-tree of 8port 4-up/4-down switch chips. Since 4096= 4^6 there are 6 levels in the switch hierarchy for a 4000 node cluster. The bottom three are in the node-level chasses and the top three in the spine chasses. Such a network has a diameter of 11 (5 switches on the way up, one at the top and 5 on the way down). 11 * 21ns = 231.

The 1.38us MPI latency that Quadrics showed at SC'04 was done with two adjacent nodes in a 128-way system. The furthest two nodes would have added 4*21ns ~= 0.08us of latency.  But for marketing purposes 1.38 looks better than 1.46 :-)

Note that the same graph in the Hotchips paper also shows the measured latency in the cables of 218ns for 50m of cables (as four 12.5m lengths). Light can only travel 65.4m in this time (3.e8m/s * 218.e-9s). For low latency interconnects on very large clusters, the speed of light is increasing becoming signficant.  The 1.38us quoted above was done on a cluster using 5m cables. If we had used say 1.5m cables then this would in theory have shaved ~0.03ns off the latency.

Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------



> Hi Craig, Mark
> 
> just a few clarifications:
> 
> the hot chips paper data showed the latency on a 4096 nodes machine
> 
> the latency from pad to pad on the Elite chip is about 21 
> nanoseconds, to go from anywhere to anywhere you pass through 
> 11 Elite switch chips (5 on the way up and 6 on the way down) 
> thus 21*11=231.
> 
> also: in the Hot chips paper the cable delay included the PBC 
> trace delay on the PCB and within the midplane, thus the 
> figure quoted for the Elite cable delay is the time required 
> for the head of the packet to be fully routed across an 
> unblocked network, in particular this includes all the 
> arbitrations at each stage of the network.
> 
> Finally, 1.38 microseconds is the MPI  0 byte ping pong 
> latency through a single Elite that we recently measured on 
> 2.2 GigaHertz, Numa enabled Tyan Opteron Nodes. 
> 
> Hope this helps, 
> 
> Federica
> 
> -----Original Message-----
> From: Craig Tierney [mailto:ctierney at HPTI.com]
> Sent: 17 November 2004 16:25
> To: Mark Hahn
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] torus versus (fat) tree topologies
> 
> 
> On Tue, 2004-11-16 at 17:07, Mark Hahn wrote:
> > > Mmm ... from your 2003 Hot Chips presentation on Elan 4 I see 231 
> > > nanos.  Which is right, or are we talking about two 
> different things?
> > 
> > AFAICT, the 25ns figure is for an individual 8-port xbar chip,
> > and a full-sized switch is three stages of these.  but 6*25!=231.
> > I believe there's at least one quadrics doc that quotes 300ns for 
> > the switch.  perhaps the 231 number is derived from average latency
> > (since some ports are just one xbar away)?
> > 
> > also, isn't SGI's numalink network a dual fat-tree?  
> they're claiming
> > 1.1 us latency these days (though again, that might be averaged over
> > all possible paths...)
> 
> Is that shmem latency or MPI latency?  I think that the MPI 
> latency is closer to 2 u/s.
> 
> Craig
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list