[Beowulf] Teraflop chip hints at the future

Wed Feb 14 10:15:28 PST 2007

On Wed, Feb 14, 2007 at 09:51:21AM -0800, Jim Lux wrote:

> I'm not sure you could put any processor (except maybe something like 
> a microcontroller) into a DRAM design and keep the densities 
> up.  There are all sorts of things that might bite you.. aside from 

IBM has just announced at the ISSCC a 1-transistor eDRAM
substitute for the 6T-SRAM cell used in caches. (Others
have already demonstrated 1T-SRAM years ago, AMD has Z-RAM,
Intel Floating Body Cells, T-RAM doesn't need a capacitor,
etc. -- embedded RAM is reasonably common in network
processors, IIRC).

http://www.heise.de/newsticker/meldung/85295

It's 45 nm SOI (starting 2008), 1.5 ns access (SRAM does 0.8..1 ns),
and is supposed to be far more dissipation-friendly. Theoretically
this gives you 6 times the eDRAM of a CPU cache, which is at least
12 MBytes, and possibly up to 48 MBytes (Power6 dual-core has 8 MBytes
on-die cache).

> thermal issues, I suspect that the number of mask layers, etc. is 
> fairly small for DRAM.  The actual materials on the chip (doping 
> levels, etc.) may not allow for a reasonably performing processor 
> with reasonable feature sizes and thermal properties.  Getting the 
> heat away from the junction is a big deal.
> 
> I think DRAMs are built with a maximum of 4 layers of interconnect 
> with vias, while processors have a lot more layers and a much more 
> sophisticated interconnect structure.

Above processes are compatible with CPU processes, so there's some
hope the piggybacking in Terascale doesn't have to be forever.

> Each and every switch has some non-zero power associated with 
> changing state. Sure, the core swings smaller voltages and energies, 
> but a DRAM cell is a lot smaller than a flipflop or half-adder in the 
> CPU, and only one is changing at a time, as opposed to thousands.

At the horizon, there's MRAM which can also do logic with a little
extension to each cell (a kind of nonvolatile FPGA). It's not
that hugely fast, but it's static, and very low power.

> A big advantage of integrating CPU and memory, though, is that you 
> don't have to "go offchip" which saves a huge amount in 
> drivers/receivers, etc.   Of course, this is why everyone is looking 

Yes, this is a major advantage. No pads, too, but a few serial
high-speed links.

> to integrated photonics and/or real high speed serial 
> interconnects.  The I/O buffer might consume a hundred or thousand 
> times more power than the onchip logic driving it.  Trading some more 
> logic inside to serialize and deserialize, and do adapative 
> equalization, in exchange for fewer "wires out of the chip" is a good deal.
> 
> Then, there's the speed of light problem.  Put two chips 10cm apart 

Increasing density to true 3d integration is a very good way
to reduce the average distance. Stacking computation modules
on a 3d lattice also minimizes dead space, of course with
current cooling you won't get more than a few 10 MW out of
a paper basket volume before the cluster goes China syndrome.

> on a board, and the round trip time (say for address to get there and 
> data to get back) is going to be in the nanoseconds area, even if the 
> chip itself were infinitely fast.

The mammal CNS has a 120 m/s signalling limit, yet it can process pretty 
complex stimuli in few 10 ms.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820            http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE