[Beowulf] choosing a high-speed interconnect
daniel.kidger at quadrics.com
Tue Oct 12 16:44:28 PDT 2004
> I'm sure posing this may raise more questions than answer but which
> high-speed interconnect would offer the best 'bang for the buck':
> 1) myrinet
> 2) quadrics qsnet
> 3) mellanox infiniband
> Currently, our 30 node dual Opteron (MSI K8D Master-FT boards) cluster
> uses Gig/E and are looking to upgrade to a faster network.
WelI I am from one of the vendors that you cite so perhaps by reply is biased.
But hopefully I can reply without it seeming like a sales pitch.
Our QsNetII interconnect sells for around $1700 per node (card=$999, rest is
cable and share of the switch). A 4U high 32-way switch would be the nearest
match in tems of size for a 30-node cluster. (c $14K iirc)
MPI bandwidth is 875MB/s on Opteron (higher on say IA64/Nocona but the AMD
PCI-X bridge limits us),
MPI latency is 1.5us on Opteron. - only sligthtly better the Cray/Octigabay
Opteron product (usually quoted as 1.7us.)
Infiniband bandwidth is only a little less than ours, and latency not much
worse than twice ours. Myrinet lags a fair bit currently but they do have a
new faster product soon to hit the market which you should look out for.
All vendors have a variety of switch sizes - either as a fixed size
configuration - or as a chassis that takes one or more line cards that can be
upgraded if your cluster gets expanded. Some solutions such as Myrinet revE
cards need two switch ports per node but otherwise you just need a switch big
enough for your node count and allowing for possible future expansion.
Very large clusters have multiple switch cabinets arranged as node-level
switches which have links to the nodes and top-level 'spine' switch cabinets
that interconnect the node-level cabinets. If you have the same number of
links to the spine switches as you do to the actual nodes then you should
have 'full bisectionall bandwidth'. However you can save money by cutting
back on the amount of spine switching you buy.
Many interconnect vendors offer a choice of copper or fibre cabling. The
former is often cheaper (no expensive lasers) but the latter can be used for
longer cable runs and is often easier to physically manage particularly when
installing very large clusters.
What to buy depends very much on your application. Maybe you haven't proved
that your GigE is the limiting factor. I do have figures for Fluent on ours
and other interconnects but the Beowulf list is not the correct place to post
As Robert pointed out, most vendors will loan equipment for a month or so and
indeed many can provide external access to clusters for benchmarking
purposes. Also for example the AMD Developer Center has large Myrient and
Infiniband clusters that you can ask to get access to.
Hope this helps,
Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505
----------------------- www.quadrics.com --------------------
More information about the Beowulf