ATHLON vs XEON: number crunching

Jakob Oestergaard jakob at
Sat Jun 22 10:00:08 PDT 2002

On Fri, Jun 21, 2002 at 07:57:47AM -0400, Ivan Oleynik wrote:
> > Is there any chance you can re-run the benchmarks with better
> > optimization enabled ?   That would be really interesting to a lot of us
> > here on the list.
> > 
> What are optimization options for PGI compiler that you can suggest to
> optimize memory throughput problems? I am more than willing to test this.
> My original thought was just to avoid any substantial optimization tuning,
> and use generic -O1 option for both platforms. By the way, running Xeon
> binary (PGI compiled with -tp piv) on Athlon and vise versa does not make
> any substantial difference.

Sorry, I have no specific suggestions. It's been a while since I played
with PGI compilers, and I have never used them for production stuff.

My idea is, that if the compiler optimizes the code "better", this
optimization will most likely cause less L1/l2 cache traffic, which in
turn will cause less memory bus traffic.  This may help in case the
Athlons are seeing memory throughput problems.

On the other hand - the effects may also prove to be insignificant.

> > Any chance you can try using ATLAS ?
> > 
> > You would need to compile one ATLAS for the Intel CPUs and one for the
> > AMD ones.
> > 
> For the purpose of comparison, I don't need to use ATLAS, because the same
> pieces of BLAS and LAPACK source code is compiled for both plaforms. It
> would make sense to use them if I could prove that by playing with
> optimization options I can tweak binary to make Athlon to overperform
> Xeon.

ATLAS does a lot of tweaking and experimenting by itself - not just
compiler options, but blocking parameters of the algorithms, and other
very important tings that compilers just cannot do well.

I would really recommend that you try it.  If your code spends most of
it's time in BLAS/LAPACK routines, using ATLAS will probably show some
significant differences.

Changing that, may well change more than any set of compiler options, if
your code is dominated (time-wise) by LAPACK/BLAS calls.

> By the way, I followed up one of the suggestion to load both processors on
> each Athlon and Xeon nodes to check memory bandwidth for 2 processors
> running simultaneously. My conclusion remains the same: Xeon 2.2 GHz is
> 50% faster than Athlon XP 2100+.

That is very interesting.

The results from your experiments here are really a good read. I hope we
can talk you into running a few more tests   :)


:   jakob at   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :

More information about the Beowulf mailing list