[Beowulf] Clusters just got more important - AMD's roadmap

Wed Feb 8 09:52:55 PST 2012

Odd threading/quoting behavior in my mail client.. Comments below with **

-----Original Message-----
From: Eugen Leitl [mailto:eugen at leitl.org] 
Sent: Wednesday, February 08, 2012 8:39 AM
To: Lux, Jim (337C); Beowulf at beowulf.org
Subject: Re: [Beowulf] Clusters just got more important - AMD's roadmap

On Wed, Feb 08, 2012 at 06:18:50AM -0800, Lux, Jim (337C) wrote:

> It can be done (and probably has), but it's going to be "exotic" and 
> expensive.

There are some liquid metal (gallium alloy, strangely enough not sodium/potassium eutectic which would be plenty cheaper albeit a fire hazard if exposed to air) cooled GPUs for the gamer market. I might have also read about one which uses a metal pump with no movable parts which utilizes MHD, though I don't remember where I've read that.

There are also plenty of watercooled systems among enthusiasts, including some that include CPU and GPU coolers in the same circuit.

I could see how gamers could push watercooled systems into COTS mainstream, it wouldn't be the first time. Multi-GPU settings are reasonably common there, so the PCIe would seem like a good initial fabric candidate.

**Those aren't plug into a backplane type configurations, though.  They can use permanent connections, or something that is a pain to do, but you only have to do it once.  I could see something using heat pipes, too, which possibly mates with some sort of thermal transfer socket, but again, we're talking exotic, and not amenable to large volumes to bring the price down.

> >I don't see how x86 should make it to exascale. It's too bad 
> >MRAM/FeRAM/whatever isn't ready for SoC yet.
> 
> 
> Even if you put the memory on the chip, you still have the 
> interconnect scaling problem.  Light speed and distance, if nothing 
> else.  Putting

You can eventually put that on WSI (in fact, somebody wanted to do that with ARM-based nodes and DRAM wafers bonded on top, with redundant routing around dead dies -- I presume this would also take care of graceful degradation if you can do it at runtime, or at least reconfigure after failure, and go back to last snapshot). Worst-case distances would be then ~400 mm within the wafer, and possibly shorter if you interconnect these with fiber.

** Even better is free space optical interconnect, but that's pretty speculative today.  And even if you went to WSI (or thick film hybrids or something similar), you're still limited in scalability by the light time delay between nodes.  If you had just the bare silicon (no packages) for all the memory and CPU chips in a 1000 node cluster, that's still a pretty big ball o'silicon.  And a big ball o'silicon that is dissipating a fair amount of heat.

The only other way to reduce average signalling distance is real 3D integration.

** I agree.  This has been done numerous times in the history of computing.  IBM had that cryogenically cooled stack a few decades ago.  The "round" Cray is another example.

> everything on a chip just shrinks the problem, but it's just like 15 
> years ago with PC tower cases on shelving and Ethernet interconnects.

Sooner or later you run into questions like "what's within the lightcone of a
1 nm device", at which point you've reached the limits of classical computation, nevermind that I don't see how you can cool anything with effective >THz refresh rate. I'm extremely sceptical about QC feasibility, though there's some work with nitrogen vacancies in diamond which could produce qubit entanglement in solid state, perhaps even at room temperature. I just don't think it would scale well enough, and since Scott Aaronson also appears dubious I'm in good company.