[Beowulf] Power calculations , double precision, ECC and power of APU's

Sat Mar 23 08:48:40 PDT 2013

On 3/22/13 2:10 PM, "Geoffrey Jacobs" <gdjacobs at gmail.com> wrote:

>On 03/22/2013 03:09 PM, Vincent Diepeveen wrote:
>> 
>>
>
>> Adding ECC to a GPU is a major problem and really not something that you
>> 'just add'.
>> It has huge bandwidth implications, especially limitations, i understood
>> from a knowledeable hardware engineer
>> whose name shall not be quoted and who for sure doesn't speak for any
>> known manufacturer
>> and sure not when talking to me.
>
>It costs a lot of money to respin an ASIC to include pathways for parity
>bits, scrub logic, etc. The performance costs are not phenomenal.

Not necessarily.. There's some non-zero delay through the calculation of
the syndrome bits and/or the error correction logic. Historically in a
synchronous design, you add one "clock" to account for that delay. Back in
the day with microcoded instructions that take many clocks to execute, you
might not really notice a bit hit because memory access took 2 clocks
instead of 1 (or more realistically, 5 clocks instead of 4.. Clock out
column, clock out row, wait 1 clock, read value).

 If you have an architecture that is memory access dominated, then
doubling the access time could have a big effect.  You might be able to
build EDAC logic that is faster than the memory cycle time, too.  When you
get into caches, and such, then there's all kinds of games you can play.

I have seen pipelined designs where they assume that there's no error
(most of the time) and if the EDAC logic detects a problem (a clock or two
later), then they dump the pipeline and restart with the corrected value.
This is similar to the strategy described with only having parity on L1,
and a parity error triggers a "cache miss"

>