Need advice on cluster hardware
Ole W. Saastad
ole at scali.no
Mon Jan 14 00:13:24 PST 2002
"Eray Ozkural (exa)" <erayo at cs.bilkent.edu.tr>
and >Ron Choy wrote:
> > (* The reason why I have gigabit nics and 10/100 switch is that I don't
> > know if bandwidth is going to be a limit on the computations so I would
> > rather start out small and expand later. (is this a good idea?) )
> Linear Algebra problems are likely to require a lot of bisection bandwidth.
> Is that switch going to work with your 1000Base-TX (?) NIC's at all? I assume
> you'd be better off with a switch that suits your hardware. If you have the
> budget go for a gigabit switch.
> Will you work on dense or sparse problems? Your requirements are likely to
> differ for the type of matrices you will use, and of course the kind of
> research you will make. If you are a computational scientist you'd like a
> faster network, if you are a computer scientist you might need a slower
> network to show that your algorithm is effective in low-bandwidth
If you want to start small start with bonded 100 Mbits ethernet, this is
cost and give you relatively high bandwidth.
The bandwidth obtained with gigabit os newer as you would hope for, more
Mbytes/sec. The high bandwidth you get with the 1000T is nice for
file transfers but as with all ethernet the latency is killing you. With
11-12 Mbytes/sec. (or app. double with bonded) the latency is a problem
possible to live with. However, when using gigabit ethernet the latency
with TCP/IP is really killing you. When the price difference between 100
becomes zero, you get much better performance for at no extra cost, then
picture changes somewhat. The latency is still your bottleneck, with
running over a 100 microseconds for a TCP/IP interconnect. It is the
protocol that mainly account for this latency. In short the message is
for most applications it does not help very much to replace fast
with gigabit ethernet when you must pay a lot of money for 1000T cards
I would recommend an interconnect with lower latency like SCI. With
than 4 microseconds and measured bandwidth over 300 MB/sec. SCI
gigabit network. (Myrinet is an alternative.) If you want to start
small a few
wulfkit would enable you to set up a small 2x or 4x cluster to test you
and verify your bottlenecks.
Ole W. Saastad, Dr.Scient.
Scali AS P.O.Box 70 Bogerud 0621 Oslo NORWAY
Tel:+47 22 62 89 68(dir) mailto:ole at scali.no http://www.scali.com
ScaMPI: bandwidth .gt. 300 MB/sec. latency .lt. 4 us.
More information about the Beowulf