[Beowulf] Teraflop chip hints at the future

Mark Hahn hahn at mcmaster.ca
Wed Feb 14 09:17:59 PST 2007

>>> Intel is stacking dram dice above the cpu as an L4 cache, but the article
>> stacking seems like a major hack - I'd rather think about how to do 
>> processor-in-memory (perhaps zram?).
> It's a technology thing.. you can't get DRAM densities with processes used 
> for CPUs and the like. Different fabs, different processes, even though the

1Gb (128GB) seems to be the current state-of-production for normal DRAM;
Intel has 24 MB on some chips, though we mightn't call those production - 
the mass-market chips are at a "mere" 8MB onchip.

so, waving hands wildly, there's about a 16x density advantage; this is 
a bit more than one might expect from transistor counts (~1 vs ~6, iirc),
but as you say, dram is highly tweaked for density.

> feature sizes are similar.  There's also some thermal issues.  If you use a 
> CPU process to build ram, it's not very dense (think cache on current

actually, I was more thinking of putting more memory (not necessarily 
standard dram) onto a CPU-oriented process.

> don't know that you can even build a big CPU on a DRAM process.  DRAMs are 
> pretty highly optimized (read, they've spent billions of dollars on tweaking 
> the device models to within a gnats eyelash of the physics limits).. for

that's not the point, of course - even a small CPU on each dram chip would 
add up to a profoundly powerful system.  for instance, take a pretty mundane
2-socket, 16GB workstation today and notice it's got probably 128 separate
dram chips.  imagine if each of those had even a small onchip processor
(say, 2-4Mt).  the potential is there for something quite useful (I admit 
practical problems to getting dram vendors/industry to do such a thing...)

> instance, because with DRAM you only read or write one location at time, very

well, I have the impression that a lot of the power dissipated by modern
chips is actually the external clock/PLL and drivers.  then again, a dram 
chip only dissipates a fraction of a watt (I looked at a Micron 1Gb ddr2/667- 
it could possibly dissipate <.5 (all banks interleave), but normal
back-to-back sequential activity would be only ~.3W.  that's for ddr2 at
1.8V - ddr3 is 1.5 and I imagine the trend to lower voltages will continue.

> few transistors change state on any given cycle, so the power dissipation is 
> low.  Compare with a CPU where you have thousands of transistors changing 
> state on a cycle.

that's still a good point.  a single transaction on a current dram would 
only warm up one row of one bank.  probably modelable by ignoring the
dissipation of the array itself, and just counting the control/sense/io

> Go to the IEEE High Speed Digital Interconnect Workshop in Santa Fe this 
> year... there's amazing stuff that people are doing.

alas, my day-job is sys admin/programmer/dogsbody, not designing new,
cutting-edge compute architectures ;(

regards, mark hahn.

More information about the Beowulf mailing list