[Beowulf] Are mobile processors ready for HPC?

Wed May 29 06:37:23 PDT 2013

AM, "Eugen Leitl" <eugen at leitl.org> wrote:

>On Tue, May 28, 2013 at 06:17:19PM +0000, Lux, Jim (337C) wrote:
>
>> But I agree with Bogdan that raw processor speed doesn't necessarily
>>imply scaleable power consumption, etc.
>
>There seems to be a convergence in x86 (Haswell) and ARM core power
>efficiency recently.

I think that's basically driven by things like feature size: the number of
transistors it takes to add two numbers doesn't change, so the variable
left is geometry (which in turn drives speed/power).

> 
>> The enormous volumes of mobile devices does mean that whatever price
>>you are getting those processors at is likely to be as low as it can be
>>(that is, all the "economies of scale" have been fully realized).
>> 
>> But, there's a significant "per chip" cost for handling, socketing,
>>assembly, etc.  And that cost does not change very much with level of
>>integration.  You get to the extreme where when you  buy7400 SSI parts
>>(or, for that matter microcontrollers) that the dominant cost is the
>>package: e.g. a 6 pin package is cheaper than an 8 pin package with the
>>same exact die inside; and mounting the 8 pin package requires 33% more
>>solder paste, and so forth.
>
>The big point for SoCs, especially with stacked memory, is that there are
>fewer
>solder pads. What is not present in COTS SoCs is signalling fabric, but
>that
>is arguably not expensive in terms of Si real estate and pin count, if
>compared
>to memory buses.

Yes, higher density packaging (3D packaging, for instance) is that it sort
of reduces the number of pins/pads touching the PWB, but I'm not sure that
it's cheaper overall.  Sometimes it just moves the packaging and wiring
somewhere else.  What really is important from a power dissipation
standpoint is "not going to a trace and staying on-chip".  I don't know
enough about stacked memory to know if they're leveraging the "all in one
package" idea to run the interconnects at something other that 3.3V levels.

I don't see a good way to get around the "transition from die to package
to board to wire" problem for large scale interconnects/fabrics, other
than direct optical interconnects (VCSELs, etc.).  All those transitions
eat energy, impose processing requirements for equalization of the
impedance discontinuities (whether the processing is analog or digital is
sort of immaterial, it costs energy)

> 
>> That said, I think that these ideas are important to explore.. Slide 20
>>talks about the lack of ECC.   Well, if you're serious about exascale,
>>you've got to embed fault tolerance into the very fabric of the
>>algorithm, rather than trying to glue it on afterwards with ECC, or
>>network retries, or whatever.
>
>I wonder what happened to MRAM in embeddeds. Here's a fully static design
>which takes no power or leaks, and is rad-proof to boot.

Volume? Yield?  Probably density is too low.. Freescale/Everspin has a 64
Mbit part, which isn't all that big.

They're really, really expensive too.. An Everspin MR4A16BCMA35  (16 bit
I/O, 16 Mbit, 35ns part) is about $25. (QTY:250)

A Gbyte of MRAM would set you back $10,000-15,000, for just the parts.
Judging from the stock quantities (hundreds of each part #) at Mouser, I'd
say it's in the "curiosity" stage, as opposed to the "mass use and
production" stage.
http://www.mouser.com/catalog/catalogusd/646/278.pdf

Rad hard probably isn't a issue, at least in the Single Event Effects
sense.

>