ATHLON vs XEON: number crunching
Jakob Oestergaard
jakob at unthought.net
Sat Jun 22 10:00:08 PDT 2002
On Fri, Jun 21, 2002 at 07:57:47AM -0400, Ivan Oleynik wrote:
...
> > Is there any chance you can re-run the benchmarks with better
> > optimization enabled ? That would be really interesting to a lot of us
> > here on the list.
> >
>
> What are optimization options for PGI compiler that you can suggest to
> optimize memory throughput problems? I am more than willing to test this.
> My original thought was just to avoid any substantial optimization tuning,
> and use generic -O1 option for both platforms. By the way, running Xeon
> binary (PGI compiled with -tp piv) on Athlon and vise versa does not make
> any substantial difference.
Sorry, I have no specific suggestions. It's been a while since I played
with PGI compilers, and I have never used them for production stuff.
My idea is, that if the compiler optimizes the code "better", this
optimization will most likely cause less L1/l2 cache traffic, which in
turn will cause less memory bus traffic. This may help in case the
Athlons are seeing memory throughput problems.
On the other hand - the effects may also prove to be insignificant.
...
> > Any chance you can try using ATLAS ?
> >
> > You would need to compile one ATLAS for the Intel CPUs and one for the
> > AMD ones.
> >
>
> For the purpose of comparison, I don't need to use ATLAS, because the same
> pieces of BLAS and LAPACK source code is compiled for both plaforms. It
> would make sense to use them if I could prove that by playing with
> optimization options I can tweak binary to make Athlon to overperform
> Xeon.
ATLAS does a lot of tweaking and experimenting by itself - not just
compiler options, but blocking parameters of the algorithms, and other
very important tings that compilers just cannot do well.
I would really recommend that you try it. If your code spends most of
it's time in BLAS/LAPACK routines, using ATLAS will probably show some
significant differences.
Changing that, may well change more than any set of compiler options, if
your code is dominated (time-wise) by LAPACK/BLAS calls.
>
> By the way, I followed up one of the suggestion to load both processors on
> each Athlon and Xeon nodes to check memory bandwidth for 2 processors
> running simultaneously. My conclusion remains the same: Xeon 2.2 GHz is
> 50% faster than Athlon XP 2100+.
That is very interesting.
The results from your experiments here are really a good read. I hope we
can talk you into running a few more tests :)
Cheers,
--
................................................................
: jakob at unthought.net : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob Østergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
More information about the Beowulf
mailing list