Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster?

Rayson Ho raysonlogin at
Thu Apr 18 10:54:06 PDT 2002

--- shin at wrote:
> 1. In terms of the floating point performance, looking at CFP2000 on
> and the Xeon should offer much better FP performace
> that the older alphas we have. I could only find results for a 4100
> 5/533 (which is the closest to our current setup) and these were
> much lower than the results from Dell Precision Workstation 530 with
> 2.0Ghz proc.

Since you are using the processors in SMP configurations, you should be
looking at SPECfp_rate2000. 

SPECfp2000 tells you how fast you code runs on a single CPU. But with
SPECfp_rate2000, you can see how well a processor scales in SMP

One thing I found last yr was that when the processors share the memory
bandwidth on an SMP machine, the performance is really bad. My
configuration was daul-P3s with Myirnet. I measured the performance of
the cluster running MPI programs using 8 machines with 1 process on
each machine, and 2 processes on 4 machines. To my suprise, 1 process
on 8 machines had a better performance.

My prof. then gave us his results on an IBM server machine (not PC
server), his results were the opposite. His conclusion was that the
memory bandwidth of PCs does not scale with the number of processors.

(BTW, it was an assignment -- I wasn't the only one who found similar
results -- there were around 20 people in that class)

> 2. Quad systems seem to be way more expensive than duals and I could
> only find quad systems running at 900Mhz per proc instead of 2GHz in
> the duals - so I assume the quads are out on cost and proc. speed
> alone.

I believe the performance of qurd systems will not give you double of
the duals, even if you use the 2Ghz CPUs. The PC (or should I say
Intel??) architecture has the shared memory bus, which does not scale
with the #CPUs.

BTW, is AMD MP better?? I've heard that each Althon MP CPU talks to its
own system cpuset.

> 3. One of my concerns was the use of mpi across 8xdual Xeon nodes
> versus 3xquad alpha nodes. I'm assuming that mpi(ch) will look after
> all the necessary for us in terms of communication between
> processors within a node and communication across nodes - but is the
> speed of memory, throughput etc a limiting factor on this type of PC
> architecture? Will we hit latency issues within a node that we're
> not currently hitting?

See above.

You can actually use OpenMP within the nodes and MPI between the nodes.
However, MPICH and LAM MPI are not thread safe...


Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax

More information about the Beowulf mailing list