[Beowulf] gpu numbers

Mark Hahn hahn at mcmaster.ca
Sun Nov 23 15:00:03 PST 2008

one thing I was surprised at is the substantial penalty that the 
current gtx280-based gpus pay for double-precision.
I think I understand the SP throughput - since these are genetically
graphics processors, their main flop-relevant op is blend:
 	pixA * alpha + pixB * beta
that's 3 sp flops, and indeed the quoted 933 glops = 
240 cores @ 1.3 GHz * 2mul1add/cycle.  I'm a little surprised
that they quote only 78 DP gflops - 1/12 the SP rate.
I counted ops when doing base-10 multiplication on paper,
and it seemed to require only 4x each SP mul.  I guess the 
problem might simply be that each core isn't OOO like CPUs,
or that emulating DP does't optimally utilize the available 2mul+add.

note also: 78 DP Gflops/~200W.  3.2 GHz QC CPU: 51 DP Gflops/~200W.
figuring power is a bit tricky, but price is even worse.  for power,
NV claims <200W (not less than 150 in any of the GTX280 reviews, though).
but you have to add in a host, which will probably be around 300W;
assuming you go for the C1070, the final is 4*78/(800+300).
a comparison CPU-based machine would be something like 2*51/350W.
amusingly, almost the same DP flops per watt ;)

does anyone know whether the reputed hordes of commercial Cuda apps
mostly stick to SP?

More information about the Beowulf mailing list