[Beowulf] El Reg: AMD reveals potent parallel processing breakthrough

Wed May 1 14:41:48 PDT 2013

> On 05/01/2013 08:51 AM, Christopher Samuel wrote:
>> This sounds interesting..
>>
>> http://www.theregister.co.uk/2013/05/01/amd_huma/

unfortunately, nothing much new there.  we knew from other leaks that 
there would be systems with both normal ddr and gddr mapped into the 
same coherent physical space.  that's really the point of the whole 
HSA thing, and goes along with the impending placement of ram chips
onto the APU package by both AMD and Intel.

this is a very good thing for HPC.  it's obviously not hard to build 
even conventional SIMD CPU architectures that have serious bandwidth
issues with conventional cache-mitigated dram.  even without resorting
to GPU-like wider arrays of ALUs.  I don't see that it'll complicate 
much - just another flag to mmap (like MAP_HUGETLB).
>
> "In today's CPU-GPU computing schemes, when a CPU senses that a process
> upon which it is working might benefit from a GPU's muscle, it has to
...

> That last time I checked, CPUs don't sense anything - the programmer has
> has to write the program to  use the GPUs muscle.

I'm shocked that anyone would accuse theregister 
of whimsey or anthropomorphization!

though in a narrow sense, "sense" here could mean "application called 
ACML's FFT with appropriate size/params, so use GPU rather than CPU".

anyway, here's my understanding of the state of things:

- there will be a Haswell chip with in-package ram of some sort.
it's extremely unclear what kind of ram, though - some people claim
it's a custom 128M edram (not sure why 'e', since it's not embedded)
there are also claims this acts as last-level cache.

- AMD is making APU chips that will talk both ddr and gddr,
the latter presumably on-chip.  they'll be shipping significant 
volumes by way of xbox-next and ps4 consoles...

- Nvidia has a plan for in-package dram as well, but it's years away.

- HMC consortium (incl Micron, Samsung, IBM, but not Intel) has a 
standard that seems well-suited for integration via 2.5d interposer.

from an HPC perspective, faster memory is unambiguously good, even if 
it's fixed in size, unupgradable, asymmetric.  turning the GPU into more
of a first-class on-chip functional unit will provide a much more managable
programming model.