[Beowulf] Terrible scaling with a Cisco Catalyst 2948G-GE-TX Switch

Mauricio Carrillo Tripp trippm at gmail.com
Fri Mar 11 12:56:26 PST 2005


Hi, I'm new in this list. I hope there's somebody here that can help me out.
I've been using a cluster with 16 nodes for quite a while, the switch I'm using 
is a cheap DELL switch (PowerConnect 2624, ~$300). I run Molecular Dynamics
Simulations using NAMD, and the scaling I got was good (around 75%
performance on 16 nodes). 
I recently got another 32 pc's to build a new cluster, but this time I had to
buy a more expensive switch (Cisco Catalyst 2948G-GE-TX, ~$3,000). To
my surprise, the scaling is just TERRIBLE, and after a lot of tests I finally
found that the problem is the switch (I'm not sure what or why exactly, though).
Using the Cisco switch or the Dell switch on the same cluster running
exactly the same program I get very different scaling (please see Fig. 3 at
http://chem.acad.wabash.edu/~trippm/Clusters/performance.php).
I logged in into the Cisco switch's interface and disabled spantree, enabled
fastport and set the ports to be always 1000tx (instead of being auto detect),
and nothing I do seems to help. 
So, if there is someone else using this type of Cisco switch, could you tell me
if you found the same behaviour and figured out what was wrong?
Any other thoughts will be appreciated too.
Thanks.


-- 

Mauricio Carrillo Tripp, PhD
Department of Chemistry
Wabash College
trippm at wabash.edu
http://chem.acad.wabash.edu/~trippm



More information about the Beowulf mailing list