Speed of writes to cache and memory.

Mark Hahn hahn at physics.mcmaster.ca
Thu Feb 27 14:18:06 PST 2003

> How fast can the average processor write to local main memory, both in
> bursts to successive addresses and to random addresses?.

usually formulated as bandwidth and latency.

these parameters are changing pretty quickly.  well, OK, bandwidth
is changing fairly quickly; latency is, like most latencies, improving
all too slowly.

on a typical ia32 from maybe 3 years ago (athlon, PC133 say), 
you could expect ~800 MB/s and perhaps 150 ns.  nowadays,
a fairly entry-level DDR box will push more like 1.8 GB/s
and 100ns.  higher-end machines (dual-DDR or rambu$) 
can get up to around 4 GB/s, but about the same latency.

I think we're getting to something of a plateau, bandwidth-wise.
I don't think that dram buses will continue scaling in frequency
as nicely has they have recently, but perhaps I'm being cynical.
there's a huge cost associated with going to wider buses.

latency, otoh, can be readily finessed by permitting more outstanding
transactions and cleaning up the dram bus protocol.  putting on my 
cynical hat again, this may be ignored by CPU vendors simply because
process progress is giving us embarassingly large onchip caches.

> I know that in many systems to read a cache line from main memory take
> between 1/10 to 1/20th of a micro seconds.  This is because the CPU has to
> issue the read, find if the desired memory is in its various levels of
> cache, if not, send an address out on the memory bus, wait for the roughly
> 30 ns RAM read delay, and then burst back 8 or so words into cache, at the
> memory-to-CPU bandwidth.

interestingly, a P4 will read 128 bytes on a cache miss, but write 64
(ignoring any prefetching that may also happen.)  I don't know of any other
CPU that does this.  however, all CPUs do critical-word-first, which probably
makes a significant difference.

> If a CPU tries to write to local memory presumably it would be much quicker,
> because the CPU does not have to wait for any response.  

"local memory"?  are you talking ccNUMA?

> That being the
> case, how fast can a CPU write cache lines to different locations in main
> memory?

well, the whole CPU doesn't stall waiting for a write to complete,
if that's what you mean.  the CPU has write buffers between it and
even L1, and there are additional stages of writing, often even in 
the chipset's memory controller.

> Another question: how do most CPU’s interact with cache on write operations?

through write buffers.

> Presumably a write to cache take about the same number of cycles as a read
> from the same level cache.  Is this true?

no, the write goes immediately into the write buffer, and then the instr
can retire, as far as I know...

> My understanding is that in many CPU’s one can select either (1) to make all
> memory writes to cache also written through to main memor, or 

well, probably not _all_ writes, but selectable on a page or region basis, 
such as with x86 MTRR's selecting write-back/combining/through.
choosing this in the instr would be interesting, but probably overkill;
controlling it with a chip mode would be a sledgehammer and horribly slow.

> (2) to have
> writes made to main memory only when a cache line is replaced in cache and
> thus swapped out to main memory.  Is this true?.

I guess you mean writeback, which is generally the norm everywhere.
the write goes to memory when explicitly flushed or driven out by a 
conflict with some future load/store.

> If so, what are the
> typical delays associated with each type of write?

the CPU only has to dump the write in the write/store buffer, and can go on 
immediately.  the L1 cache probably can't accept stores that fast
continually, though, particularly when there are conflicts.

> Is it possible to select which of these two types of writes to use on a per
> instruction method?

well, many CPUs provide a set of store-through-cache or similar instructions.

> If one is using method (1) can one cause the main memory image of a cache
> line to be update under program control?

only to a fairly limited degree.

> Also, are there any systems where one can indicate to the CPU that certain
> information is to be kept in cache and other should not, or is all caching
> controlled only by which of various cache lines have been most recently
> used?

sounds like mtrr/pat to me.

> I would appreciate any enlightenment on these subjects.

perhaps you should also say what you're really trying to do...

regards, mark hahn.

More information about the Beowulf mailing list