[Beowulf] Teraflop chip hints at the future
Mark Hahn
hahn at mcmaster.ca
Wed Feb 14 09:17:59 PST 2007
>>> Intel is stacking dram dice above the cpu as an L4 cache, but the article
>>
>> stacking seems like a major hack - I'd rather think about how to do
>> processor-in-memory (perhaps zram?).
>
> It's a technology thing.. you can't get DRAM densities with processes used
> for CPUs and the like. Different fabs, different processes, even though the
1Gb (128GB) seems to be the current state-of-production for normal DRAM;
Intel has 24 MB on some chips, though we mightn't call those production -
the mass-market chips are at a "mere" 8MB onchip.
so, waving hands wildly, there's about a 16x density advantage; this is
a bit more than one might expect from transistor counts (~1 vs ~6, iirc),
but as you say, dram is highly tweaked for density.
> feature sizes are similar. There's also some thermal issues. If you use a
> CPU process to build ram, it's not very dense (think cache on current
actually, I was more thinking of putting more memory (not necessarily
standard dram) onto a CPU-oriented process.
> don't know that you can even build a big CPU on a DRAM process. DRAMs are
> pretty highly optimized (read, they've spent billions of dollars on tweaking
> the device models to within a gnats eyelash of the physics limits).. for
that's not the point, of course - even a small CPU on each dram chip would
add up to a profoundly powerful system. for instance, take a pretty mundane
2-socket, 16GB workstation today and notice it's got probably 128 separate
dram chips. imagine if each of those had even a small onchip processor
(say, 2-4Mt). the potential is there for something quite useful (I admit
practical problems to getting dram vendors/industry to do such a thing...)
> instance, because with DRAM you only read or write one location at time, very
well, I have the impression that a lot of the power dissipated by modern
chips is actually the external clock/PLL and drivers. then again, a dram
chip only dissipates a fraction of a watt (I looked at a Micron 1Gb ddr2/667-
it could possibly dissipate <.5 (all banks interleave), but normal
back-to-back sequential activity would be only ~.3W. that's for ddr2 at
1.8V - ddr3 is 1.5 and I imagine the trend to lower voltages will continue.
> few transistors change state on any given cycle, so the power dissipation is
> low. Compare with a CPU where you have thousands of transistors changing
> state on a cycle.
that's still a good point. a single transaction on a current dram would
only warm up one row of one bank. probably modelable by ignoring the
dissipation of the array itself, and just counting the control/sense/io
logic.
> Go to the IEEE High Speed Digital Interconnect Workshop in Santa Fe this
> year... there's amazing stuff that people are doing.
alas, my day-job is sys admin/programmer/dogsbody, not designing new,
cutting-edge compute architectures ;(
regards, mark hahn.
More information about the Beowulf
mailing list