[Beowulf] Re: vectors vs. loops

Robert G. Brown rgb at phy.duke.edu
Wed May 4 12:03:51 PDT 2005


On Wed, 4 May 2005, Eugen Leitl wrote:

> On Wed, May 04, 2005 at 09:19:35AM -0600, Josip Loncaric wrote:
> 
> > That may work for games, but not for everyone.  A common operation like
> > 
> > C = A + B
> > 
> > is very fast when A, B, and C are small enough to fit into the cache 
> > simultaneously.  However, for scientific computing, the size of these 
> > vectors could be 1 GB each (per CPU!), and the problem is memory 
> > bandwidth bound.  Today's memory bandwidths cannot support full CPU 
> > speed on a problem like this.
> 
> There are tricks to optimize available memory bandwidth on modern x86
> architectures though, as described in
> 
> http://leitl.org/docs/comp/AMD_block_prefetch_paper.pdf
> 
> (and far more in http://leitl.org/docs/comp/AMD64softoptguide.pdf ).

Awesome documents -- very informative!  I'm saving copies for my own
edification (presuming that is permitted by their respective licenses).

Do you have any idea how the "fully optimized loops" in the example code
compare timewise to gcc results for obvious implementations of the same
loops, or ditto for other compilers?  How necessary is it for us to
start inlining assembler in order to get a threefold improvement in
effective throughput in a straightforward core loop?  Do compilers
automatically use block prefetch and three phase implementations of the
floating point involved?

   rgb

> It would be interesting to know whether DDR2 (and coming DDR4) will
> especially profit from above, given that the latency is getting 
> arguably worse (I think the same applies to RAMBUS type of memories which
> seem to be the default memory for the Cell CPU). 	
> 
> Does anyone has a DDR2 machine, and could run the numbers?
>  
> > A fact of life in scientific computing, e.g. CFD, is that the workload 
> > resembles "C=A+B".  People try to get better reuse of data in cache, but 
> > there is only so much that an algorithm will allow.  Thus, memory (and 
> > network) bandwidths remain the main bottleneck.


-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list