Pentium IV Xeon memory bandwidth. Any experience?
Robert G. Brown
rgb at phy.duke.edu
Mon Jun 25 09:35:10 PDT 2001
On Mon, 25 Jun 2001, Greg Lindahl wrote:
> On Mon, Jun 25, 2001 at 02:19:15PM +0200, Thomas Guignon wrote:
>
> > We have tested and 1.2 Ghz with PC2100 DDR with Level 1 Blas and results are
> > quite nice:
> > -dnrm2: (one vector read)
> > 1450 10^6 B/s
> > -ddot: (two vector read)
> > 1040 10^6 B/s
> > -daxpy (one vector read and one vector read/write)
> > 1150 10^6 B/s
> > - copy (one vector read and one vector write)
> > 990 10^6 B/s
>
> These numbers look like they are for vectors that fit into cache.
>
> What does the STREAM benchmark report for this board? I bet it's
> substantially slower, and the person asking the question wanted main
> memory bandwidth, not cached bandwidth.
Greg,
Here is stream for a 1.33 GHz Tbird with PC2100:
rgb at ganesh|T:108>stream_gcc
# Function Rate (MB/s) RMS time Min time Max time
Copy: 608.4571 0.0619 0.0263 0.1465
Scale: 497.4804 0.0778 0.0322 0.1523
Add: 658.1838 0.0779 0.0365 0.1368
Triad: 587.9196 0.1086 0.0408 0.1614
For comparison here is stream for a 1.33 GHz Tbird with PC133
rgb at g15|T:104>stream_gcc
# Function Rate (MB/s) RMS time Min time Max time
Copy: 392.4357 0.0408 0.0408 0.0412
Scale: 395.3837 0.0405 0.0405 0.0405
Add: 431.7950 0.0557 0.0556 0.0560
Triad: 430.1768 0.0559 0.0558 0.0562
About what you'd expect: 200 (2 x 100 MHz)/133 = 1.5
608/392 = 1.55
497/395 = 1.26
658/431 = 1.53
587/430 = 1.37
The PC2100 is about 50% faster than PC133 and hence so are the streaming
float rates out where memory is the bottleneck.
This doesn't use the Athlon prefetch, though. This might be responsible
for the higher numbers seen above.
Which leads to the very practical question: How does one use the Athlon
prefetch? Is there a compiler option? Or does one have to code in
assembler? Where would one find out -- my local AMD rep was learning
about this from me instead of the other way around, so clearly "ask your
AMD rep" is a bad answer to this...;-)
rgb
P.S. For further comparison, stream run on a 933 MHz PIII with PC133
rgb at parvati|T:103>stream_gcc
# Function Rate (MB/s) RMS time Min time Max time
Copy: 289.6763 0.0553 0.0552 0.0556
Scale: 325.9658 0.0492 0.0491 0.0494
Add: 360.1278 0.0667 0.0666 0.0667
Triad: 304.0168 0.0790 0.0789 0.0793
showing clearly inferior performance even though it is equipped with the
same speed of memory (note that CPU clock is basically irrelevant to
stream). A 933 MHz PIII equipped with RDRAM yields:
rgb at b16|T:105>stream_gcc
# Function Rate (MB/s) RMS time Min time Max time
Copy: 441.1835 0.0369 0.0363 0.0372
Scale: 441.1966 0.0363 0.0363 0.0363
Add: 577.6035 0.0416 0.0416 0.0416
Triad: 360.9077 0.0666 0.0665 0.0667
which is a bit better than a PC133 Tbird but not as good (or anywhere
near as cheap) as a PC2100 equipped Tbird.
Hope somebody finds this useful.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list