Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] gpu numbers

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Sun Nov 23 15:00:03 PST 2008


one thing I was surprised at is the substantial penalty that the 
current gtx280-based gpus pay for double-precision.
I think I understand the SP throughput - since these are genetically
graphics processors, their main flop-relevant op is blend:
 	pixA * alpha + pixB * beta
that's 3 sp flops, and indeed the quoted 933 glops = 
240 cores @ 1.3 GHz * 2mul1add/cycle.  I'm a little surprised
that they quote only 78 DP gflops - 1/12 the SP rate.
I counted ops when doing base-10 multiplication on paper,
and it seemed to require only 4x each SP mul.  I guess the 
problem might simply be that each core isn't OOO like CPUs,
or that emulating DP does't optimally utilize the available 2mul+add.

note also: 78 DP Gflops/~200W.  3.2 GHz QC CPU: 51 DP Gflops/~200W.
figuring power is a bit tricky, but price is even worse.  for power,
NV claims <200W (not less than 150 in any of the GTX280 reviews, though).
but you have to add in a host, which will probably be around 300W;
assuming you go for the C1070, the final is 4*78/(800+300).
a comparison CPU-based machine would be something like 2*51/350W.
amusingly, almost the same DP flops per watt ;)

does anyone know whether the reputed hordes of commercial Cuda apps
mostly stick to SP?



More information about the Beowulf mailing list