Athlon memory speed asymmetry
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Feb 25 14:02:51 PST 2003
- Previous message: Athlon memory speed asymmetry
- Next message: Athlon memory speed asymmetry
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 25 Feb 2003, David Mathog wrote: > I'm observing an odd asymmetry in memory utilization on > Athlons. Trivial array operations (read then write) > are up to 30% faster going up through the array then > down through the array. ... > Here's a tiny example program (97 lines, mostly comments): ... > Anybody else seen this before? > Who knows what causes it? > Is there a way around it on these Athlons? Very interesting. No, I hadn't noticed this one, although I have seen some real oddities in arithmetic rates before on both intel and AMD. To get more precise timing measurements of the differential and to be able to study them as a function of vector size and stride, I embedded this in the memtest portion of cpu_rate. The cpu_rate memtest just does a simple read/write of a memory location in an unsigned int vector whose length can be specified on the command line, at a stride that can also be specified on the command line, wrapped up in a timing harness that subtracts off the time required for the loop itself and any embedding overhead (I hope). It thus lets you time the microscopic operation at a fairly high precision in different cache contexts, which may help us figure this out. The cpu_rate tool (with the reverse sequential memtest) is available on the brahma site: http://www.phy.duke.edu/brahma or my personal website, http://www.phy.duke.edu/~rgb under the Beowulf tab. Some results from cpu_rate are given below, on my 1333 Mhz Athlon (with PC 2100 DDR). rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4096 # Time: 2291.8379 +/- 0.8568 nsec # ======================================================================== # Timing test 7 # Samples = 100 Loop iterations per sample = 1024 # Time: 6247.6912 +/- 60.0939 nsec #======================================================================== # Sequential Integer Memory (read/write) Access: # size = 1000 stride = 1 vector length = 4000: # aitmp = ai[aindex] # ai[aindex] = aitmp # where aindex = ai[i] = i initially. # NANOTIMER granularity (nsec/cycle) = 0.750 # avg_time_full = 3.123846 avg_time_empty = 1.145919 # Average Time: 1.98 nanoseconds # BogomegaRate: 505.58 megaseqmem int read/writes per second rgb at ganesh|T:901>cpu_rate -t 8 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4096 # Time: 2293.3385 +/- 0.8943 nsec # ======================================================================== # Timing test 8 # Samples = 100 Loop iterations per sample = 1024 # Time: 6250.7871 +/- 59.6240 nsec #======================================================================== # Sequential (backwards) Integer Memory (read/write) Access: # size = 1000 stride = 1 vector length = 4000: # aitmp = ai[aindex] # ai[aindex] = aitmp # where aindex = ai[i] = i initially. # NANOTIMER granularity (nsec/cycle) = 0.750 # avg_time_full = 3.125394 avg_time_empty = 1.146669 # Average Time: 1.98 nanoseconds # BogomegaRate: 505.38 megabackseqmem int read/writes per second rgb at ganesh|T:906>cpu_rate -t 9 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4096 # Time: 2291.3671 +/- 0.7767 nsec # ======================================================================== # Timing test 9 # Samples = 100 Loop iterations per sample = 1024 # Time: 6227.6784 +/- 7.6977 nsec #======================================================================== # Random Integer Memory (read/write) Access: # size = 1000 stride = 1 vector length = 4000: # aitmp = ai[aindex] # ai[aindex] = aitmp # where aindex = ai[i] = shuffled index. # NANOTIMER granularity (nsec/cycle) = 0.750 # avg_time_full = 3.113839 avg_time_empty = 1.145684 # Average Time: 1.97 nanoseconds # BogomegaRate: 508.09 megarandmem int read/writes per second
- Previous message: Athlon memory speed asymmetry
- Next message: Athlon memory speed asymmetry
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
