[Beowulf] Better C2D or Quadcore

Bill Broadley bill at cse.ucdavis.edu
Tue Nov 27 06:18:40 PST 2007

amjad ali wrote:
> Hello,
> I planned to buy 9 PCs each having one Core2Duo E6600 (networked with GiGE)
> to make cluster for running PETSc based applications.

Ideally you would plan on buying $x of cluster instead of limiting your
choices to a particular number of PCs.  There are 1,2,4 socket machines
with 1,2,4 cores per socket.  Not to mention a wide range of interconnects.

> I got an advice that because the prices of Xeon Quadcore is going to drop
> next month, so I should buy 9 PCs each having one Quadcore Xeon (networked
> with GiGE) to make cluster for running PETSc based applications.
> Which is better for me to get better performance/speedup?
> My question is due to following as given in PETSc-FAQ:
> *What kind of parallel computers or clusters are needed to use PETSc?*
> PETSc can be used with any kind of parallel system that supports MPI. BUT for
> any decent performance one needs
>    - a fast, low-latency interconnect; any ethernet, even 10 gigE simply
>    cannot provide the needed performance.

Usually that kind of statement would include something like to scale
to 64 Nodes.  Often (but not always) it's harder to scale as the number
of nodes increases.  Infinipath has really low latency, as does Mellanox's 
ConnectX (if it's shipping), and Myrinet 10G.  All better than the usual 10G
numbers (that I've seen anyways).  Is anyone else under 3.0 us?

Of course if it only scales within a node then you want as many cores
within the node.

>    - high per-CPU memory performance. Each CPU (core in dual core
>    systems) needs to have its own memory bandwith of roughly 2 or more
>    gigabytes.

Er, presumably thats 2 or more GB/sec.

> For example, standard dual processor "PC's" will
> notprovide better performance when the second processor is used, that

Er, standard dual processor PCs can hit 4GB/sec.  Even my $750 desktop from
dell, lousy memory, 1.8 GHz cpu gets 4GB/sec at stream add and triad:
Function      Rate (MB/s)   Avg time     Min time     Max time
Add:         3945.7460       0.0124       0.0122       0.0126
Triad:       3951.5930       0.0124       0.0121       0.0129

You could do substantially better with a different compiler, CPU, or dimms.

Not to mention using OMP with an OMP friendly compiler (gfortran, pathscale,
or portland group).

> is, you
>    will not see speed-up when you using the second processor. This is because
>    the speed of sparse matrix computations is almost totally determined by the
>    speed of the memory, not the speed of the CPU.

I'd seen over 15GB/sec from a dual socket 8 core barcelona, if what they
say is true (even 10G doesn't scale) it might be your best performance
even if you can afford substantially less than 9 of them.   Small vendors
seem to be selling systems in the $2-3k range for reasonable dual socket

More information about the Beowulf mailing list