[Beowulf] AMD64 results...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kozin, I (Igor) i.kozin at dl.ac.ukThu Dec 16 05:27:33 PST 2004
- Previous message: [Beowulf] OpenPBS vs Condor?
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Bill, very interesting results.
> Ah, got icc-8.1 to cooperate, dual 2.2 Ghz opteron+pc3200+2.4 kernel,
> 915.5MB array:
> -O1
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 2285.8039 0.2640 0.2800 0.3200
> Scale: 2206.9798 0.2690 0.2900 0.3000
> Add: 2341.5554 0.3740 0.4100 0.4200
> Triad: 2181.9031 0.4060 0.4400 0.4800
>
> -O2
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 2370.4856 0.2570 0.2700 0.3400
> Scale: 2285.8280 0.2670 0.2800 0.3400
> Add: 2461.6513 0.3710 0.3900 0.4600
> Triad: 2285.8229 0.3920 0.4200 0.5000
pls note that your "average time" is sometimes less than "min time".
> -O3
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 2461.5867 0.2730 0.2600 0.3400
> Scale: 2370.4237 0.2910 0.2700 0.3500
> Add: 2526.3684 0.4050 0.3800 0.4800
> Triad: 2341.5151 0.4320 0.4100 0.5100
>
> The strange thing is they are 32 bit binaries, despite being built
> on a 64 bit os on a 64 bit hardware.
how do you know they are not 64bit? From what I see it is.
quad 2.2 Opteron, 9 GB, SLES 9, 2.4.21
it seems my memory is a bit slower than yours.
PARAMETER (n=32000000,offset=0,ndim=n+offset,ntimes=50)
i.e. using 732 MB
pathscale 1.4 -O3
Function Rate (MB/s) Avg time Min time Max time
Copy: 3555.5778 0.1444 0.1440 0.1450
Scale: 3483.0084 0.1473 0.1470 0.1480
Add: 3588.8372 0.2142 0.2140 0.2150
Triad: 3605.6772 0.2134 0.2130 0.2140
ifort -O3 -xW
Copy: 3657.1588 0.1458 0.1400 0.1500
Scale: 3657.1588 0.1475 0.1400 0.1500
Add: 3490.9503 0.2273 0.2200 0.2300
Triad: 3339.1509 0.2317 0.2300 0.2400
> Not sure why the timer is so lousy,
> I had to make the array large to get a reasonably accurate time:
This is indeed another interesting point. I'd really like to understand it.
In addition when I re-run stream the rates vary quite a bit despite
the high loop count (50) and very small std dev (min & max are pretty close).
e.g. two more times ifort -O3 -xW
Copy: 2560.0146 0.2094 0.2000 0.2200
Scale: 2560.0146 0.2094 0.2000 0.2200
Add: 2477.4389 0.3219 0.3100 0.3300
Triad: 2400.0023 0.3285 0.3200 0.3300
Copy: 3657.1588 0.1454 0.1400 0.1500
Scale: 3657.1588 0.1473 0.1400 0.1500
Add: 3490.9503 0.2256 0.2200 0.2300
Triad: 3339.1371 0.2300 0.2300 0.2300
Igor
> I played around with various mentioned optimizations (including -xW)
> on the manpage, I never managed a 64 bit binary with icc-8.1 though.
> The man page has numerous i32em and em64t references.
>
>
>
>
> --
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
>
I. Kozin (i.kozin at dl.ac.uk)
CCLRC Daresbury Laboratory
tel: 01925 603308
http://www.cse.clrc.ac.uk/disco
- Previous message: [Beowulf] OpenPBS vs Condor?
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
