Pentium IV Xeon memory bandwidth. Any experience?

Tue Jun 26 09:51:27 PDT 2001

On Tue, 26 Jun 2001, Greg Lindahl wrote:

> On Tue, Jun 26, 2001 at 11:49:48AM +0200, Thomas Guignon wrote:
>
> > From my opinion the STREAM bencmark reports more the compiler ability
> > to generate eficient code than the hardware performance.
>
> It measures both. If your compiler isn't generating efficient code,
> then maybe you should use a better compiler, because most people don't
> write their applications in assembly.
>
> > I think it's dificult to conclude something on memory bandwidth with a gcc
> > compiled STREAM.
>
> This is correct. There are a couple of compilers for x86 which use SSE
> prefetch instructions that you could use.

Or perhaps a better way of looking at it is that stream measures the
speed with which your system does four particular, reasonably common
vector operations on very long vectors with whatever compiler you happen
to be inclined to use, and hence >>might<< be relevant to rates you
would observe for (very) similar operations in your own code compiled
with that compiler.  That is, I generally agree with you Greg, but think
you aren't pessimistic enough about what stream measures.

Might not be relevant, as well.  ATLAS shows fairly clearly that linear
algebra rates can be improved by a factor of 2-3 by optimal code
reorganization.  It also only takes a few instructions in a core loop to
push code from the memory bound regime to the CPU bound regime (exactly
how many I'm curious about, actually, and have thought of measuring).
Finally, stream returns results that differ by a fairly consistent
fraction when run with dynamically allocated memory (ptr/malloc) instead
of compiled-in data vectors and stream-1 therefore doesn't sweep memory
sizes. Most folks, at a guess, allocate memory vectors dynamically in at
least C code since otherwise one has to recompile to change problem size
which is a drag.

Still, stream with things like compiler choice held fixed is likely
relevant to >>comparative<< hardware performance as a >>relative<<
measure with a single compiler.  I have no idea what the actual hardware
memory bandwidth at a deep technical level is from running stream (or
cpu-rate, which sweeps the memory size), but I don't feel horribly wrong
in concluding things like "Gee, the Athlon memory subsystem for PC133
memory beats the hell out of the Intel PC133 subsystem for vector-like
code and costs less money".

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu