[Beowulf] Engineers boost AMD CPU performance by 20% without overclocking

Tue Feb 28 12:09:28 PST 2012

> The paper is now available online, "CPU-Assisted GPGPU on Fused
> CPU-GPU Architectures":
>
> http://people.engr.ncsu.edu/hzhou/hpca_12_final.pdf

thanks for the reference.

> (I have not read the whole paper yet) I think the core idea is that
> the CPU acts as a prefetch thread and pulls data into the shared L3
> for the GPU cores (this work is like other prefetch thread research

yes, though it's a bit puzzling, since the whole point of GPU design
is to have lots of runnable threads on hand, so that you simply switch
from stalled to non-stalled threads to hide latency.

so in the context of prefetching, I'd expect a bundle of threads to 
make a non-prefetched reference, stall, but for other bundles to utilize
the vector unit while the reference is resolved.  gotta read the paper I guess!