Athlon goes DDR

Robert G. Brown rgb at phy.duke.edu
Tue Oct 31 09:26:39 PST 2000


On Tue, 31 Oct 2000, David N. Lombard wrote:

> > Seriously, this looks very promising.  According to my benchmarks, there
> > is a class of code (mostly big-memory vector arithmetic operations) that
> > will just about double if they do indeed succeed in doubling memory
> > bandwidth.  However, they didn't mention (or if they did I didn't see
> > it) the effect on memory latency.  Are we to presume that DDR won't
> > affect memory latency relative to SDRAM?  I would guess that if it
> > significantly improved it they would have said something...
> 
> Having worked with one of those codes for a long time, I can assure you
> that latency is not the issue, these are bandwidth dominant codes.  The

Vector operations are indeed bandwidth dominant, but not all codes are
vector dominant, and there are even simulation codes that do e.g. random
site selection that are latency dominated (rather than the mix which
probably describes "most code").  That's why I asked (and Greg
answered).  Latency matters, just not so much to folks that are
inverting large matrices.  

The exact same issue arises with network IPC's -- throughput when one
sends lots of small packets in an environment where aggregation isn't
possible sucks because the traffic is latency bound, not limited by the
wirespeed-related peak bandwidth.  Beowulfs frequently do relatively
poorly for this kind of parallel application, because (with the except
of very high performance and expensive networks like Myrinet) "over the
counter" network latencies tend to be hundreds of microseconds.  Other
kinds of parallel application send big chunks of data all at once,
where the (e.g.) ~100+ microsecond latency is irrelevant as it permits
packets to be sent at a rate that saturates the physical bandwidth.

> app in question is one of the few that does well with RDRAM.  We have
> always been more concerned about memory b/w than CPU speed.  Back when
> Intel first released the 150 MHz P5, we did a benchmark with rejumpering
> the carrier of a 150 MHz CPU w/ 60 MHz FSB to 133 MHz CPU w/ 66 MHz
> clock, the latter was 10% faster.
> 
> Equally significant in the press release was the 266 MHz FSB.

I agree.  I know that you know all this stuff -- I'm only writing it out
on the list so that list newbies don't get confused and immediately run
out and spend massive amounts on bleeding edge memory subsystems and
then feel wounded when it doesn't deliver improved performance relative
to CPU clock.  It is important to understand the KIND of e.g. memory
access that is dominant in your application before you guestimate
whether or not any given memory "improvement" is a cost/benefit win.

You might find the ALS talk pictures in e.g.

<a href="http://www.phy.duke.edu/brahma">Brahma</a>

Interesting.  They show basically the same thing.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list