Athlon memory speed asymmetry

Robert G. Brown rgb at phy.duke.edu
Tue Feb 25 14:02:51 PST 2003


On Tue, 25 Feb 2003, David Mathog wrote:

> I'm observing an odd asymmetry in memory utilization on
> Athlons.  Trivial array operations (read then write)
> are up to 30% faster going up through the array then
> down through the array.
...
> Here's a tiny example program (97 lines, mostly comments):
...
> Anybody else seen this before?
> Who knows what causes it?
> Is there a way around it on these Athlons?

Very interesting.  No, I hadn't noticed this one, although I have seen
some real oddities in arithmetic rates before on both intel and AMD.

To get more precise timing measurements of the differential and to be
able to study them as a function of vector size and stride, I embedded
this in the memtest portion of cpu_rate.  The cpu_rate memtest just does
a simple read/write of a memory location in an unsigned int vector whose
length can be specified on the command line, at a stride that can also
be specified on the command line, wrapped up in a timing harness that
subtracts off the time required for the loop itself and any embedding
overhead (I hope).  It thus lets you time the microscopic operation at a
fairly high precision in different cache contexts, which may help us
figure this out.

The cpu_rate tool (with the reverse sequential memtest) is available on
the brahma site:

  http://www.phy.duke.edu/brahma

or my personal website, http://www.phy.duke.edu/~rgb under the Beowulf
tab.

Some results from cpu_rate are given below, on my 1333 Mhz Athlon (with
PC 2100 DDR).

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


#
========================================================================
# Timing "Empty" Loop
# Samples = 100  Loop iterations per sample = 4096
# Time:  2291.8379 +/-     0.8568 nsec
#
========================================================================
# Timing test 7
# Samples = 100  Loop iterations per sample = 1024
# Time:  6247.6912 +/-    60.0939 nsec 
#========================================================================
# Sequential Integer Memory (read/write) Access:
# size = 1000  stride = 1  vector length = 4000:
#   aitmp = ai[aindex]
#   ai[aindex] = aitmp
#   where aindex = ai[i] = i initially.
# NANOTIMER granularity (nsec/cycle) =  0.750
# avg_time_full = 3.123846 avg_time_empty = 1.145919 
# Average Time:   1.98 nanoseconds
# BogomegaRate: 505.58 megaseqmem int read/writes per second

rgb at ganesh|T:901>cpu_rate -t 8
#
========================================================================
# Timing "Empty" Loop
# Samples = 100  Loop iterations per sample = 4096
# Time:  2293.3385 +/-     0.8943 nsec
#
========================================================================
# Timing test 8
# Samples = 100  Loop iterations per sample = 1024
# Time:  6250.7871 +/-    59.6240 nsec 
#========================================================================
# Sequential (backwards) Integer Memory (read/write) Access:
# size = 1000  stride = 1  vector length = 4000:
#   aitmp = ai[aindex]
#   ai[aindex] = aitmp
#   where aindex = ai[i] = i initially.
# NANOTIMER granularity (nsec/cycle) =  0.750
# avg_time_full = 3.125394 avg_time_empty = 1.146669 
# Average Time:   1.98 nanoseconds
# BogomegaRate: 505.38 megabackseqmem int read/writes per second

rgb at ganesh|T:906>cpu_rate -t 9
#
========================================================================
# Timing "Empty" Loop
# Samples = 100  Loop iterations per sample = 4096
# Time:  2291.3671 +/-     0.7767 nsec
#
========================================================================
# Timing test 9
# Samples = 100  Loop iterations per sample = 1024
# Time:  6227.6784 +/-     7.6977 nsec 
#========================================================================
# Random Integer Memory (read/write) Access:
# size = 1000  stride = 1  vector length = 4000:
#   aitmp = ai[aindex]
#   ai[aindex] = aitmp
#   where aindex = ai[i] = shuffled index.
# NANOTIMER granularity (nsec/cycle) =  0.750
# avg_time_full = 3.113839 avg_time_empty = 1.145684 
# Average Time:   1.97 nanoseconds
# BogomegaRate: 508.09 megarandmem int read/writes per second



More information about the Beowulf mailing list