>2 p4 processor systems

Mikhail Kuzminsky kus at free.net
Wed Aug 28 08:44:20 PDT 2002


According to Brian LaMere
> So I'm trying to find out if anyone knows of a 4-way p4 system out there.
> I'm wanting to bring a couple dual-p4's in here just so they'll see that the
> performance far surpases the current per-node performance we have on our
> cluster, but...brick wall.  The guy above me agrees with me, the guy above
> him won't talk to me about it.  He just gets all excited about a 6-way p3
> server in 1u.  Whoopie.
> So...help?  Anyone know of any 4-way p4 systems?  And no, amd isn't an
> option (unfortunately).
> 
   I want to add some words to "minuses" of x86 SMPs. We use 2-CPUs
Tyan S2460 w/Athlon MP which don't require such many memory throughput
as P4 for "obtaining" of high performance. We tested S2460 w/Athlon MP
1800+ under STREAM tests (using OpenMP parallelization of loops
with ifc 5.0) and found
that 2-CPU (2-thread) results are not better than for 1 CPU. You may find
close results for 2-CPUs SMPs at //www.streambench.org.

Some applications are scaled relative well from 1 to 2 CPUs SMP. The
examples are
1)  Linpack(n=100 and n=1000) which is localized in cache
2) Gaussian 98 SCF method where localization in cache is also high.
In last case the speed-up on test178 is something about 1.7 (I don't remember
exactly).

But high-performance calculations, in particular many methods realized in g98
are memory-bounded now. So you should expect bad speedup on 2-CPUs
x86 systems because of memory bottlenecks. Most 2-CPUs x86 SMPs have
1-port main memory, and the competition for memory of modern x86 CPUs 
will be high (especially for P4, where SPECfp2000 data depends significantly
from memory throughput). So it's not clear for me that 2-CPUs SMP
are more attractive than 2*single-CPUs nodes (yes, we should calculate
price/performance ratio ...).

What is about 4-CPUs SMPs then I looked in some cases that the
architecture is bad in the sense of memory throughput scaling
(but this was for more old PIII-based systems). Therefore it's
necessary to be sure that 4-CPUs P4 systems has efficient memory
throughput, else 2*2 CPUs SMP may be better.

Mikhail Kuzminsky
Zelinsky Institute of Organic Chemistry
Moscow



More information about the Beowulf mailing list