[Beowulf] dual-core benefits?

Fri Sep 23 06:43:17 PDT 2005

> > is this scalability assuming a slow interconnect like gigabit?
> Yes, gigabit on Pentium 4 cluster.

well, the e1000 is a decent nic, but it is sometimes configured
for too much interrupt mitigation to suit HPC.  even so, it's not 
anywhere close to the domain of performance as a real cluster 
interconnect (~2 us, 400-1800 MB/s).

> > have you considered when it would be appropriate to go to something fast?
> Yes, that is probably sth that we will consider after trying gigabit and two
> network interfaces per mb. 

dual-port has a reputation for not helping much.  it's only a small
boost in bandwidth in the ideal case.

> > on a multiprocessor system, you effectively have a pretty fast, if small,
> > interconnect.  if your code can take advantage of that, then going
> > dual-core could well be a win.  for instance, if your code is limited
> > by short-message, point-to-point latency, then increasing "SMP-ness"
> > should help a lot, especially if you are assuming mere gigabit.
> 
> Well, actually I'm still not sure about this. The CPUs inside the node will
> communicate fast, but then the network will be a bottleneck?

I was careful in what I said, and perhaps not explicit enough.  if your 
code has a lot of short, p-p messages, then the MPI will avoid the use
of the nic on paths within the machine.  (at least myrinet and quadrics do).
that's a significant win, since intra-node is .8us/800MBps for an
older opteron cluster I have, vs 3.5/240 for inter-node.

so if you're only scaling to 8x, and you use dual-dual nodes, half your 
messages will be very fast.  if the inter-node fabric is gigabit, that
means .8/800 vs 30/80.

so just looking at an 8p cluster, assuming only p-p messages uniformly
distributed:
	two 2x2's will see .8/800 on half of all messages, 30/80
	on the rest or 15.4/440 aggregate.

	four 2x1's will see .8/800 on a quarter, 30/80 on the rest,
	or 22.7/260.

this is pretty rough, of course: if you had the right patterns, 
you could do better or worse.  and if you use collectives, you'll
always be limited at least by inter-node performance.

regards, mark hahn.