[Beowulf] Gigabit switch recommendations

Tony Ladd ladd at che.ufl.edu
Tue Mar 28 20:44:39 PST 2006

We recently tested 48 port gigabit switches from Extreme (summit-48t) and
Force10 (s50). We found the Extreme networks switch performed better than
the Force10 when all 48 ports were active. The s50 appeared to "choke" at
certain message sizes, leading to erratic rates and overall reduced
performance. The summit was much smoother with very little variation in
throughput. For example a bidirectional edge exchange had a max throughput
of 1540Mbps under LAM (using the Broadcom NIC) while 16 pairs (32 nodes) had
a max throughput of 1520Mbps per pair; the optimum message size was about
250KBytes. We also tested 2 switches connected by a 10G stacking cable. We
could connect 12 pairs of ports (12 on each switch) and run at essentially
the same speed (around 1500Mbps per pair) through the stacking cable.

There are a lot of hidden gotchas in switch technology so "wire speed" means
next to nothing. For example the Force 10 switch (which is a good edge
switch) has 4 12 port ASIC's. Ports on the same ASIC really do communicate
at wire speed, but between ASIC's the max bandwidth is 10Gbps, so the max
throughput is only 83% of what you would expect. By contrast the Extreme
switch is supposedly "flat", with full bandwidth under all port
configurations. The Broadcom NICS could not push data fast enough to really
stress the Extreme switch (only about 1500Mbps max per pair) but with
MPIGAMMA I can get over 1800MBps between pairs which will up the load on the

These switches are not cheap; they list for $6000-8000 but they outperform
the cheaper switches by a considerable margin. We have not been able to get
close to the theoretical bandwidth from our cheap GigE switches (HP 2724
3Com SS3).

I have recently run netpipe with MPI/GAMMA
(http://www.disi.unige.it/project/gamma/mpigamma/) using two Intel PRO1000
NIC's (82545GM) wired back-to-back. The nodes are Dell PE850 with 3.0Ghz P4D
(dual core). MPI latency was 8.6 microsecs one way and 8.8 microsecs for
bidirectional messages. The max throughput was 983Mbps one way and 1856Mbps
bidirectional. The half throughput message size is about 4KBytes. These are
consistent with pingpong tests reported on the MPIGAMMA website. The higher
throughput will enable a better test of switch performance

We have just installed a stacked array of 4 X summit-48t's. I will post
benchmarks soon.


Tony Ladd
Professor, Chemical Engineering
University of Florida
PO Box 116005
Gainesville, FL 32611-6005

Tel: 352-392-6509
FAX: 352-392-9513
Email: tladd at che.ufl.edu
Web: http://ladd.che.ufl.edu 

