[Beowulf] gpu numbers
Mark Hahn
hahn at mcmaster.ca
Sun Nov 23 15:00:03 PST 2008
one thing I was surprised at is the substantial penalty that the
current gtx280-based gpus pay for double-precision.
I think I understand the SP throughput - since these are genetically
graphics processors, their main flop-relevant op is blend:
pixA * alpha + pixB * beta
that's 3 sp flops, and indeed the quoted 933 glops =
240 cores @ 1.3 GHz * 2mul1add/cycle. I'm a little surprised
that they quote only 78 DP gflops - 1/12 the SP rate.
I counted ops when doing base-10 multiplication on paper,
and it seemed to require only 4x each SP mul. I guess the
problem might simply be that each core isn't OOO like CPUs,
or that emulating DP does't optimally utilize the available 2mul+add.
note also: 78 DP Gflops/~200W. 3.2 GHz QC CPU: 51 DP Gflops/~200W.
figuring power is a bit tricky, but price is even worse. for power,
NV claims <200W (not less than 150 in any of the GTX280 reviews, though).
but you have to add in a host, which will probably be around 300W;
assuming you go for the C1070, the final is 4*78/(800+300).
a comparison CPU-based machine would be something like 2*51/350W.
amusingly, almost the same DP flops per watt ;)
does anyone know whether the reputed hordes of commercial Cuda apps
mostly stick to SP?
More information about the Beowulf
mailing list