interesting Athlon/P4 discussion from FreeBSD-Q-l

Wed Apr 18 22:25:58 PDT 2001

On Thu, Apr 19, 2001 at 12:53:52AM -0400, Mark Hahn wrote:
> > > the P4 is not exceptional
> > > when running real code in-cache; this is why on most benchmarks
> > > other than Stream, recent Athlons beat P4's quite handily.
> > 
> > Like SPEC2000fp, for example?
> 
> notice the "in-cache" there.  yes, there is such a thing as real code 
> that is happy with 384 KB cache.

Yes. Some of the SPEC2000fp benchmarks fit pretty darn well in cache.
You can not attribute all of the P4's SPEC2000fp performance to
STREAM; ask John McCalpin for how STREAM correlates to SPEC2000fp, and
the cache sizes used by various codes. What you'll learn is that there
is a significant boost from both the high main memory bandwidth and
from a compiler capable of generating SSE2 instructions for real code
like that found in SPEC2000fp.

If you don't like my example, you should feel free to produce a
Beowulf-relevant benchmark which uses the Intel compiler on the P4.
Feel free to adjust the problem size so that it fits within cache.

> I find it terribly amusing that you're so offended by someone saying that
> Athlon is a viable alternative to the P4. 

But I didn't say that. I offered a specific counter-example to your
statement, one that is relevant to scientific computing. I think the
Athlon is probably the best for some set of codes, but not for the
reason you state.

> being
> interested in high-performance clustering of commodity hardware does not
> somehow rule out decent cache hit rates.

Nor do I think that. If you've ever attended some of my talks that
cover scaling, I show super-linear results for codes on the big
FSL system due to the problem size eventually getting inside of L2.

-- g