[Beowulf] gpu numbers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caSun Nov 23 15:00:03 PST 2008
- Previous message: [Beowulf] OpenMP on AMD dual core processors
- Next message: [Beowulf] gpu numbers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
one thing I was surprised at is the substantial penalty that the current gtx280-based gpus pay for double-precision. I think I understand the SP throughput - since these are genetically graphics processors, their main flop-relevant op is blend: pixA * alpha + pixB * beta that's 3 sp flops, and indeed the quoted 933 glops = 240 cores @ 1.3 GHz * 2mul1add/cycle. I'm a little surprised that they quote only 78 DP gflops - 1/12 the SP rate. I counted ops when doing base-10 multiplication on paper, and it seemed to require only 4x each SP mul. I guess the problem might simply be that each core isn't OOO like CPUs, or that emulating DP does't optimally utilize the available 2mul+add. note also: 78 DP Gflops/~200W. 3.2 GHz QC CPU: 51 DP Gflops/~200W. figuring power is a bit tricky, but price is even worse. for power, NV claims <200W (not less than 150 in any of the GTX280 reviews, though). but you have to add in a host, which will probably be around 300W; assuming you go for the C1070, the final is 4*78/(800+300). a comparison CPU-based machine would be something like 2*51/350W. amusingly, almost the same DP flops per watt ;) does anyone know whether the reputed hordes of commercial Cuda apps mostly stick to SP?
- Previous message: [Beowulf] OpenMP on AMD dual core processors
- Next message: [Beowulf] gpu numbers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
