Athlon SDR/DDR stats for *specific* gaussian98 jobs
Josip Loncaric
josip at icase.edu
Thu May 3 07:49:57 PDT 2001
"Robert G. Brown" wrote:
>
> IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into
> at least parts of ATLAS for use with athlons back when they were first
> released. It apparently made a quite significant difference in
> performance.
It was not me (we have Pentiums). However, prefetching and SSE
instructions should make a significant difference. For example,
Portland Group suggests compiling LAPACK and BLAS with the following
switches (using PGI compilers release 3.2-4 and a SSE-enabled Linux
kernel, i.e. version 2.2.10 or later with the appropriate patches):
Pentium III: -fast -pc 64 -Mvect=sse -Mcache_align -Kieee
Athlon: -fast -pc 64 -Mvect=prefetch -Kieee
The only exceptions are slmach.f and dlmach.f which must be compiled
using '-O0'. Also, the main program should be compiled using the '-pc
64' (64-bit double precision format).
PGI says thatin some cases a 23% performance benefit can be obtained
when prefetch instructions are used. This helps with both single- and
double-precision codes.
For single-precision codes only, the Pentium III SSE instructions can
deliver about 33% benefit. Since SSE instructions operate only on
single-precision data that is aligned on cache-line boundaries,
enforcing this alignment with '-Mcache_align' produces an even better
61% gain over the original non-SSE code (says PGI).
Finally, the PGI release 3.2-4 also supports Pentium 4 SSE2 instructions
(-tp piv -Mvect=sse ...).
Sincerely,
Josip
--
Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
More information about the Beowulf
mailing list