[Beowulf] choosing a high-speed interconnect

Dan Kidger daniel.kidger at quadrics.com
Tue Oct 12 16:44:28 PDT 2004


Chris,

> I'm sure posing this may raise more questions than answer but which
> high-speed interconnect would offer the best 'bang for the buck':
>
> 1) myrinet
> 2) quadrics qsnet
> 3) mellanox infiniband
>
> Currently, our 30 node dual Opteron (MSI K8D Master-FT boards) cluster
> uses Gig/E and are looking to upgrade to a faster network.


WelI I am from one of the vendors that you cite so perhaps by reply is biased.
But hopefully I can reply without it seeming like a sales pitch.

Our QsNetII interconnect sells for around $1700 per node (card=$999, rest is 
cable and share of the switch). A 4U high 32-way switch would be the nearest 
match in tems of size for a 30-node cluster. (c $14K iirc)

MPI bandwidth is 875MB/s on Opteron (higher on say IA64/Nocona but the AMD 
PCI-X bridge limits us), 
MPI latency is 1.5us on Opteron. - only sligthtly better the Cray/Octigabay 
Opteron product (usually quoted as 1.7us.)

Infiniband bandwidth is only a little less than ours, and latency not much 
worse than twice ours.  Myrinet lags a fair bit currently but they do have a 
new faster product soon to hit the market which you should look out for.


All vendors have a variety of switch sizes - either as a fixed size 
configuration - or as a chassis that takes one or more line cards that can be 
upgraded if your cluster gets expanded. Some solutions such as Myrinet revE 
cards need two switch ports per node but otherwise you just need a switch big 
enough for your node count and allowing for possible future expansion.

   Very large clusters have multiple switch cabinets arranged as node-level 
switches which have links to the nodes and top-level 'spine' switch cabinets 
that interconnect the node-level cabinets. If you have the same number of 
links to the spine switches as you do to the actual nodes then you should 
have 'full bisectionall bandwidth'. However you can save money by cutting 
back on the amount of spine switching you buy.

Many interconnect vendors offer a choice of copper or fibre cabling. The 
former is often cheaper (no expensive lasers) but the latter can be used for 
longer cable runs and is often easier to physically manage particularly when 
installing very large clusters.

What to buy depends very much on your application. Maybe you haven't proved 
that your GigE is the limiting factor. I do have figures for Fluent on ours 
and other interconnects but the Beowulf list is not the correct place to post 
these.

As Robert pointed out, most vendors will loan equipment for a month or so and 
indeed many can provide external access to clusters for benchmarking 
purposes. Also for example the AMD Developer Center has large Myrient and 
Infiniband clusters that you can ask to get access to.

Hope this helps,
Daniel

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------






More information about the Beowulf mailing list