What do you guys thing about the P4

Niclas Andersson nican at nsc.liu.se
Wed Apr 4 17:16:45 PDT 2001

> So my question is would any of you build a P4
> cluster if your application was bottlenecked by the memory subsytem.(ex.
> Fluid Dynamics)? Any comments?

I had the opportunity to run some benchmarks on a  1.4 GHz P4.

STREAMS shows wonderful bandwidth performance of the RDRAM. Also, the
performance numbers from the NAS suite on a 1.4 GHz P4 are remarkable. On the
application kernels (bt, sp, lu) I get approximately twice the speed of an
Athlon 1GHz/133MHz FSB/SDRAM. (the Athlon performance is already quite decent!)
On some of the other, smaller kernels (e.g. cg) the performace was even
higher. Those numbers could actually motivate the higher price of an
P4 system. Especially if one would invest in a expensive network.

However, when I ran some real applications (got 768 MByte of RDRAM)
the performance was not that remarkable any more. First, when I tried a
precompiled LSDYNA I got more or less the same speed as the AMD. Second, I
tried a particle in cell (PIC) code for plasma simulation and tweaked the PGI
compiler a bit. The performance got better, approx 20% faster than the AMD, but
still not near the increase in clock rate, not mentioning the price tag.

What is it that makes NAS run so fast on a P4 where real applications only
shows mediocre increase in speed? Is it the deep pipeline that is not supported
by the compilers? Is it the branch prediction? Or is it the fact that even if 
the bandwith is astounding, the memory latency is still fairly high?

BTW, The 133 MHz FSB on Athlon shows a nice performance increase compared to 100FSB
even on PC133 (no DDR) memory.

