Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster?

Andrew Shewmaker shewa at inel.gov
Thu Apr 18 12:08:30 PDT 2002


Rayson Ho wrote:

>--- shin at guss.org.uk wrote:
>
>
>One thing I found last yr was that when the processors share the memory
>bandwidth on an SMP machine, the performance is really bad. My
>configuration was daul-P3s with Myirnet. I measured the performance of
>the cluster running MPI programs using 8 machines with 1 process on
>each machine, and 2 processes on 4 machines. To my suprise, 1 process
>on 8 machines had a better performance.
>
For the 1 process on 8 machines case, were those 8 machines also duals?  If
they were, did you notice if one cpu was taking care of networking overhead
while the other was doing work?  If these were duals did you also run the
same tests on similar speed uniprocessor system?  Just wondering.

>
>
>My prof. then gave us his results on an IBM server machine (not PC
>server), his results were the opposite. His conclusion was that the
>memory bandwidth of PCs does not scale with the number of processors.
>
>(BTW, it was an assignment -- I wasn't the only one who found similar
>results -- there were around 20 people in that class)
>
>>2. Quad systems seem to be way more expensive than duals and I could
>>only find quad systems running at 900Mhz per proc instead of 2GHz in
>>the duals - so I assume the quads are out on cost and proc. speed
>>alone.
>>
>
>I believe the performance of qurd systems will not give you double of
>the duals, even if you use the 2Ghz CPUs. The PC (or should I say
>Intel??) architecture has the shared memory bus, which does not scale
>with the #CPUs.
>
>BTW, is AMD MP better?? I've heard that each Althon MP CPU talks to its
>own system cpuset.
>
I have mostly used dual AMDs in a high throughput rather than high 
performance
setting.  Our CFD codes would take 99% of each processor and 500-800 MB 
each
of RAM and would not interfere with each other.  Each would complete in the
same amount of time as we saw in the 1:1 case.

We also are using a monte carlo based code over PVM and there is almost no
difference between 1*8 and 2*4.  I don't remember how much memory each
process uses though (less than the above).

We have been pleased so far.

Andrew




More information about the Beowulf mailing list