[Beowulf] Engineers boost AMD CPU performance by 20% without overclocking

Thu Feb 9 03:42:55 PST 2012

http://www.extremetech.com/computing/117377-engineers-boost-amd-cpu-performance-by-20-without-overclocking

Engineers boost AMD CPU performance by 20% without overclocking

By Sebastian Anthony on February 7, 2012 at 12:44 pm

AMD Llano APU die (GPU on the right)

Engineers at North Carolina State University have used a novel technique to
boost the performance of an AMD Fusion APU by more than 20%. This speed-up
was achieved purely through software and using commercial (probably Llano)
silicon. No overclocking was used.

In an AMD APU there is both a CPU and GPU, both on the same piece of silicon.
In conventional applications — in a Llano-powered laptop, for example — the
CPU and GPU hardly talk to each other; the CPU does its thing, and the GPU
pushes polygons. What the researchers have done is to marry the CPU and GPU
together to take advantage of each core’s strengths.

To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode
unit, and the GPU becomes the primary computation unit. This works out well
because CPUs are generally very strong at fetching data from memory, and GPUs
are essentially just monstrous floating point units. In practice, this means
the CPU is focused on working out what data the GPU needs (pre-fetching), the
GPU’s pipes stay full, and a 20% performance boost arises.

Now, unfortunately we don’t have the exact details of how the North Carolina
researchers achieved this speed-up. We know it’s in software, but that’s
about it. The team probably wrote a very specific piece of code (or a
compiler) that uses the AMD APU in this way. The press release doesn’t say
“Windows ran 20% faster” or “Crysis 2 ran 20% faster,” which suggests we’re
probably looking at a synthetic, hand-coded benchmark. We will know more when
the team presents its research on February 27 at the International Symposium
on High Performance Computer Architecture.

For what it’s worth, this kind of CPU/GPU integration is exactly what AMD is
angling for with its Heterogeneous System Architecture (formerly known as
Fusion System Architecture). AMD has a huge advantage over Intel when it
comes to GPUs, but that means nothing if the software chain (compilers,
libraries, developers) isn’t in place. The good news is that Intel doesn’t
have anything even remotely close to AMD’s APU coming down the pipeline,
which means AMD has a few years to see where this HSA path leads.

If the 20% speed boost can be brought to market in the next year or two, AMD
might actually have a chance.

Updated @ 17:54: The co-author of the paper, Huiyang Zhou, was kind enough to
send us the research paper. It seems production silicon wasn’t actually used;
instead, the software tweaks were carried out a simulated future AMD APU with
shared L3 cache (probably Trinity). It’s also worth noting that AMD sponsored
and co-authored this paper.

Updated @ 04:11 Some further clarification: Basically, the research paper is
a bit cryptic. It seems the engineers wrote some real code, but executed it
on a simulated AMD CPU with L3 cache (i.e. probably Trinity). It does seem
like their working is correct. In other words, this is still a good example
of the speed-ups that heterogeneous systems will bring… in a year or two.

Read more at North Carolina State University