[Beowulf] gpu numbers

Bruno Coutinho coutinho at dcc.ufmg.br
Sun Nov 23 17:37:21 PST 2008

2008/11/23 Mark Hahn <hahn at mcmaster.ca>

> one thing I was surprised at is the substantial penalty that the current
> gtx280-based gpus pay for double-precision.
> I think I understand the SP throughput - since these are genetically
> graphics processors, their main flop-relevant op is blend:
>        pixA * alpha + pixB * beta

This is most used in texture fetching.
Unfortunately, for nvidia cards texture fetching units can't do general
purpose processing.
They can only do texture fetch operations like old gpus.

> that's 3 sp flops, and indeed the quoted 933 glops = 240 cores @ 1.3 GHz *
> 2mul1add/cycle.

The most desnse instruction that the general purpose units (stream
processors) can do multiply-add, so it's: 240 cores @ 1.3 GHz *
1mul1add/cycle = 624 gflops.

>  I'm a little surprised
> that they quote only 78 DP gflops - 1/12 the SP rate.
> I counted ops when doing base-10 multiplication on paper,
> and it seemed to require only 4x each SP mul.  I guess the problem might
> simply be that each core isn't OOO like CPUs,
> or that emulating DP does't optimally utilize the available 2mul+add.

As Gtx280-based gpus main purpose is games, the architecture is heavily
focused on SP operations, like the cell processor.
In cell, the DP throughput is nearly 1/10 of it's SP throughput.

> note also: 78 DP Gflops/~200W.  3.2 GHz QC CPU: 51 DP Gflops/~200W.
> figuring power is a bit tricky, but price is even worse.  for power,
> NV claims <200W (not less than 150 in any of the GTX280 reviews, though).
> but you have to add in a host, which will probably be around 300W;
> assuming you go for the C1070, the final is 4*78/(800+300).
> a comparison CPU-based machine would be something like 2*51/350W.
> amusingly, almost the same DP flops per watt ;)

But remember that it memory interface can do 100GB/s, four times the best
Nehalens commercially available, and its cores can have 1024 threads (32
warps) so it has better conditions to sustain high throughput (if your
application use coherent data acces, so we come back to what Michael said).

> does anyone know whether the reputed hordes of commercial Cuda apps
> mostly stick to SP?

As Cuda started to support DP only since gtx 280 was launched, I think the
answer is yes. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20081123/6ab15199/attachment.html>

More information about the Beowulf mailing list