[Beowulf] Clusters just got more important - AMD's roadmap
cap at nsc.liu.se
Wed Feb 8 11:27:49 PST 2012
On Wednesday, February 08, 2012 06:15:01 PM Mark Hahn wrote:
> > The APU concept has a few interesting points but certainly also a few
> > major problems (when comparing it to a cpu + stand alone gpu setup):
> > * Memory bandwidth to all those FPUs
> well, sorta. my experience with GP-GPU programming today is that your
> first goal is to avoid touching anything offchip anyway (spilling, etc),
> so I'm not sure this is a big problem. obviously, the integrated GPU
> is a small slice of a "real" add-in GPU, so needs proportionately
> less bandwidth.
Well yes you want to avoid touching memory on a GPU (just as you do on a CPU).
But just as you cant completely avoid it on a CPU nor can you on a GPU. On a
current socket (CPU) you see maybe 20 GB/s and 50 GF and the flop-wise much
faster GPU is also alot faster in memory access (>200 GB/s).
Now I admit I'm not a GPU programmer but are you saying those 200 GB/s aren't
needed? My assumption was that the fact that CPU-codes depend on cache for
performance but still need good memory bandwidth held true even on GPUs.
Anyway, my point I guess was mostly that it's a lot easier to sort out
hundreds of gigs per second to memory on a device with RAM directly on the PCB
than on a server socket.
Also, if the APU is a "small slice of a real GPU" then I question the point
(not much GPU power per classic core or total system foot-print).
> I think the real question is whether someone will produce a minimalist
> APU node. since Llano has on-die PCIE, it seems like you'd need only
> APU, 2-4 dimms and a network chip or two. that's going to add up to
> very little beyond the the APU's 65 or 100W TDP... (I figure 150/node
> including PSU overhead.)
I think anything beyond early testing is a fair bit into the future. For the
APU to become interesting I think we need a few (or all of):
* Memory shared with the CPU in some useable way (did not say the c-word..)
* A proper number crunching version (ecc...)
* A fairly high tdp part on a socket with good memory bw
* Noticeably better "host to device" bandwidth and even more, latency
And don't get me wrong, I'm not saying the above is particularly unlikely...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 198 bytes
Desc: This is a digitally signed message part.
More information about the Beowulf