[Beowulf] Nvidia's quantum leap in 28 nm

Sun Mar 25 11:47:00 PDT 2012

It's been some year or 12 that a genius visited me. His expertise  
being the same like Einsteins,
  it's not much of a question what his research topics were.

Though not deep into computer hardware he told me that for massive  
computing, just above the 1Ghz
border would prove to be a big barrier as electrons basically move at  
around 1/3 of the lightspeed, which
translates to 1.3Ghz in metals like aluminium. At copper so he said  
that barrier might be a tad higher
than aluminium, yet even then the power needed for such speeds would  
prove to be massive.

At that moment intel's marketing department shouted out loud their  
P4's would clock 10Ghz by 2010.

Well the P4 never got there and we got into the megacore count game  
for HPC.

AMD
Now AMD needs 4 PE's for doing double precision, so their core count  
of 1536 actually wasn't more than the 5000 series
with 1600. Their new 7970 gpu with 2048 pe's has the double precision  
equivalent in core count of 512 compute cores.

Actually the 7970 mostly profits from a 100Mhz higher frequency with  
some boosting to 1Ghz at some overclocked cards,
it gets impressive game scores. As for gpgpu of course, moving from  
1536 cores to 2048 is an interesting improvement,
yet far away from a doubling. The 7970 is said to have around 4.31B  
transistors
(see http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review )

NVIDIA FERMI
Fermi, nvidia's 40 nm gpu which currently gets used in HPC, it has 3  
bilion transistors.
Here at home i have a few 2075 Tesla's with 448 cores producing a tad  
more than 0.5 Tflop
which was its a big improvement over the previous generation.

The Nvidia Fermi on the other hand in the form of the GTX 560 clocks  
1.644Ghz and the 580 clocks 1.544Ghz.
For gpgpu this is on the risky side as getting far over that 1Ghz  
seems to be a problem. The tesla's therefore are clocked
safely 1.15Ghz

NVIDIA KEPLER 2012
The new kid on the block from Nvidia is the Kepler. It's in the 28 nm  
proces technology, just like AMD's 7970.
Now i'm not gonna redo a review for games, there is great sites for  
that.

http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/1

Over here we are interested in the implications for the beowulf  
systems of course, i read that as HPC implications.
Let's look to facts and then speculate what that means for HPC:

I'm still trying to full understand the differences, yet it seems as  
if nvidia clocked back to 1Ghz the cores. That should make
it easier to release a gpu for gpgpu as well. In the meantime core  
count went up to 1536.

The chip itself has 3.5 billion transistors. Just 500M more than  
Fermi, meanwhile at a factor 2.04 smaller proces,
that means it will consume less juice and a lot less juice.  
Benchmarks at anandtech confirm this.

Now that's a MASSIVE quantumleap. Basically factor 3 the number of  
cores available to HPC.

Additional to that the memory is 256 bits wide, versus 384 bits for  
Fermi. This should make it easier to release 2 gpu's on a single card.
Whether nvidia has those plans for gpgpu tesla's we can only  
speculate about, as the chip eats less juice, it sure fits this time  
within the
power envelope. So where the gamer kids with sureness can expect a  
690 gpu, for HPC we of course cheer if nvidia manages to
improve to 1.5 - 1.7 Tflop for their new gpu, with the option to move  
to 3 - 3.4 Tflop double precision for a 2 gpu Tesla card.

Note that some might argue that the 680 has less double precision  
capabilities than the 580. However for the Tesla this doesn't matter,
as what happens for gamerscards is that they disable some  
transistors; so the Tesla gpu will be the exact same chip like the  
kids has,
just with the double precision enabled. The same thing was the case  
with Fermi, so it's logical to expect that to happen with Kepler as  
well.

Seems like intel can also scrap their current corner project as they  
have a new goal, namely 4 Tflop, rather than a 1 Tflop manycore :)

As for Nvidia, releasing a new chip that's factor 3 the power of your  
previous one for gpgpu sure is a big quantum leap!