[Beowulf] Is there really a need for Exascale?

Fri Nov 30 06:00:03 PST 2012

On 11/29/12 3:32 PM, "Einar Rustad" <er at numascale.com> wrote:

>
>On 29. nov. 2012, at 15:52, "Lux, Jim (337C)" <james.p.lux at jpl.nasa.gov>
>wrote:
>
>> Okay.. So SRAM instead of Cache..
>> 
>> Or at least cache that doesn't care about off chip coherency (e.g. No
>>bus
>> snooping, and use delayed writeback)
>> 
>> A good paged virtual memory manager might work as well.
>> 
>> But here's a question... Would a Harvard architecture with separate code
>> and data paths to memory be a good idea. It's pretty standard in the DSP
>> world, which is sort of a SIMD (except it's not really a single
>> instruction... But you do the same thing to many sets of data over and
>> over.. And a lot of exascale type applications: finite element codes,
>> would have the same pattern)
>> 
>
>With the separate L1 I-cache and D-cache of modern processors, they are
>pretty much 
>Harvard architecture already. The L1 I-cache has a very high hit-rate for
>all programs that 
>have a significant runtime (if not, the program would have to be
>gazillions of lines long..).
>The few I-cache misses will not affect the performance of the common data
>paths very much.

Yes. The cache essentially serves as a smart virtual memory. I suppose
that a question might be what is the optimum granularity of that I cache..
 And would there be a better architecture for that cache vs the data
cache, that allowed more/faster access because you KNOW that the dynamics
of access are different.

Or, is the generic approach actually better in the long run, because it's
less specialized and therefore less dependent on clever compiler and
coding tricks to optimize performance.

>