[Beowulf] AMD64 results...
Richard Walsh
rbw at ahpcrc.org
Thu Dec 16 07:16:45 PST 2004
All,
Here are the data again comparing gcc, PGI, the Pathscale compilers on
our cluster and
Bill's Opteron with prefetching turned on in PGI and gcc as well. Our
system has the
same clock as Bill's, 2.2 GHz, but slower memory (PC2700). I have
thrown in some
X1 SSP timings are well. The numbers demonstrate the importance of
explicitly asking for
prefetching on the non-Pathscale compilers. Pathscale still comes out
on top (at about
half the X1 SSP rate) here, but the numbers are now much closer, and
these differences
may be somewhat accounted for by Bill's system's faster memory (PC32000
versus PC2700
for our system).
I include the X1 single SSP data as well. Of course if you are focused
on raw bandwidth,
you should get numbers with and without prefetching otherwise you are
silently including
cache effects.
The equivalent *one processo*r megaflop ratings for the triad data below
are:
gcc (noprefetch): 186 MFLOPs
gcc (prefetch): 279 MFLOPs
pgcc (prefetch): 300 MFLOPs
pscalecc (prefetch): 347 MFLOPs
x1cc (vector, 1ssp): 780 MFLOPs
Dual processor ratings should be close to double these on the Opteron.
So I expect one
node (two CPUs) on the Opteron is almost equal one SSP on the X1.
Enjoy and prefetch!
rbw
gcc-3.2.3 -O4 -Wall -pedantic:
Function Rate (MB/s) RMS time Min time Max time
Copy: 2004.8056 0.0095 0.0080 0.0099
Scale: 2044.7551 0.0099 0.0078 0.0105
Add: 2272.3092 0.0133 0.0106 0.0137
Triad: 2237.3599 0.0134 0.0107 0.0137
gcc-3.2.3 -O4 -fprefetch-loop-arrays -Wall -pedantic:
Function Rate (MB/s) RMS time Min time Max time
Copy: 3259.9273 0.0049 0.0049 0.0052
Scale: 3294.9803 0.0049 0.0049 0.0049
Add: 3306.7241 0.0073 0.0073 0.0073
Triad: 3349.1914 0.0072 0.0072 0.0072
pgcc -fast -Mvect=sse -Mnontemporal
Function Rate (MB/s) RMS time Min time Max time
Copy: 3227.6291 0.0050 0.0050 0.0052
Scale: 3210.1824 0.0050 0.0050 0.0050
Add: 3571.3935 0.0067 0.0067 0.0068
Triad: 3604.1280 0.0067 0.0067 0.0068
Pathscale-1.4 -O3
Function Rate (MB/s) Avg time Min time Max time
Copy: 3764.6831 0.1540 0.1700 0.1800
Scale: 3764.6831 0.1530 0.1700 0.1700
Add: 4173.8781 0.2080 0.2300 0.2400
Triad: 4173.8781 0.2110 0.2300 0.2400
X1cc -c -h inline3,scalar3,vector3 -h stream0
Function Rate (MB/s) RMS time Min time Max time
Copy: 7600.2280 0.0022 0.0021 0.0022
Scale: 7600.5529 0.0024 0.0021 0.0030
Add: 9259.1164 0.0026 0.0026 0.0027
Triad: 9360.5935 0.0026 0.0026 0.0026
Greg Lindahl wrote:
>On Wed, Dec 15, 2004 at 06:29:56PM -0800, Bill Broadley wrote:
>
>
>
>>Kudos for the pathscale-1.4 compiler with -O3.
>>
>>
>
>Thank you! The not-so-secret secret is to use non-temporal stores,
>which we do automagically where needed with plain -O3.
>
>-- greg
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>
More information about the Beowulf
mailing list