[Beowulf] which 24 port unmanaged GigE switch?

Mon Apr 5 12:40:06 PDT 2010

Michael Di Domenico
> I would have to agree.  I have Netgears in my lab now and for light
> use they seem to be okay, but once you run a communications heavy MPI
> job over them they seem to fall down

Please define "fall down".

One test I have applied to a switch (only 100baseT) to see if it could
handle "full traffic" was running the script below on all nodes:

#!/bin/bash
TINFO=`topology_info`
NEXT=`echo $TINFO | extract -mt -cols [3]`
if [ $NEXT != "none" ]
then
  TIME=`accudate -t0`
  dd if=/dev/zero bs=4096 count=1000000 | rsh $NEXT 'cat - >/dev/null'
  accudate -ds $TIME >/tmp/elapsed_${HOSTNAME}.txt
fi

Where topology_info defines a linear chain through all nodes, and what
ends up in the elapsed_HOSTNAME.txt files is transmission time from this
to the next node.  extract and accudate are mine, the former is like
"cut" and the latter is just used here to calculate an elapsed time.

This is slightly apples and oranges because in the two node (reference)
test the target node is only accepting packets, whereas when they are
all running it is also sending packets, and those compete with the ack's
going back to the first node.  The D-Link switch held up quite well, I
thought.  One pair of nodes tested this way completed in 350 seconds
(+/-), whereas it and the others took 370-380 seconds when they were all
running at once (20 compute nodes, first only sends, last only
receives).  That is, 11.7 MB/sec for the pair, 10.8 MB/sec for all
pairs.  For GigE it should come out at 117 and 108 (or so), if the
switch can keep up.

I'm curious what the netgears and HP do in a test like this.  If anybody
would like to try this, all the pieces for this simple test (if you can
run binaries for a 32 bit x86 environment) are here:

 http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/testswitch.tar.gz

(For other platforms obtain source for accudate and extract from here 

http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/drm_tools.tar.gz
)

Start the jobs simultaneously on all nodes using whichever queue system
you have installed.  Be sure to run it once first with a small count
number to force anything coming over nfs into cache before doing the big
test.  (Or one could run netpipe on each pair of nodes, or anything else
really that loads the network.)

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech