[Beowulf] How to configure a cluster network

Jan Heichler jan.heichler at gmx.net
Thu Jul 24 11:14:43 PDT 2008

Hallo Daniel,

Donnerstag, 24. Juli 2008, meintest Du:

[network configurations]

I have to say i am not sure that all the configs you sketched really work. I never saw somebody creating loops in an IB fabric. 

DP> Since I am not network expert I would be glad if somebody explains
DP> why the first solution is the best one.

Let's say it as follows:

1) most applications are latency driven - not bandwidth driven. That means that half bisectional bandwidth is not cutting your application performance down to 50%. For most applications the impact should be less than 5% - for some it is really 0%.

2) Static routing in IB networks limits your bandwidth for many of the possible communication patterns anyway. For completely random communication it was like below 50%. So you buy a IB fabric with full bisectional but can't use it anyway - reducing the bisectional bandwidth is not impacting that much anymore (as far as i understood most whitepapers)

3) today you have usually 4 or 8 cores in one node. 12 nodes times 4/8 cores makes 48 or 92 cores that are connected with one HOP on the same switch. Many applications don't scale to that number of processes anyway. Before you try to think about optimizing the network to the maximum maybe it is better to think about your application, your ususal job sizes and the scheduling of the jobs. Try to avoid "cross switch communication" if possible. If you run small jobs like let's say of 8 nodes and you have 12 nodes on each switch and half bisectional bandwidth between them then it is 8 nodes on the first switch for job 1. For job 2 it is 4 nodes on switch one and 4 on switch two. Your bisectional bandwidth is big enough to handle this.

I vote for the fat tree in picture one because i know it works and with 1) to 3) mentioned above it will give you good performance - especially if you run more than just one application (because optimizing is mostly optimizing for a single use case - if you have more than one it is hard to optimize).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080724/0b91482a/attachment.html>

More information about the Beowulf mailing list