[Beowulf] crunch per kilowatt: GPU vs. CPU
Craig Tierney
Craig.Tierney at noaa.gov
Mon May 18 12:51:02 PDT 2009
Bill Broadley wrote:
> Joe Landman wrote:
>> Hi David
>>
>> David Mathog wrote:
>>> Although the folks now using CUDA are likely most interested in crunch
>>> per unit time (time efficiency), perhaps some of you have measurements
>>> and can comment on the energy efficiency of GPU vs. CPU computing? That
>>> is, which uses the fewest kilowatts per unit of computation. My guess
>> Using theoretical rather than "actual" performance, unless you get the
>> same code doing the same computation on both units:
>>
>> 1 GPU ~ 960 GFLOP single precision, ~100 GFLOP double precision @ 160W
>
> That sounds like the Nvidia flavor GPU, granted nvidia does seem to have a
> larger lead over ATI for such use... at least till OpenCL gains more
> popularity. Nvidia's double precision rate is approximately 1/12th the single
> their precision rate. ATI's is around 1/5th, which results in around 240 GFlops.
>
Where did you get the 1/12th number for NVIDIA? For each streaming multiprocessor (SM)
has 1 single precision FPU per thread (8 threads per SM), but only 1 double precision FPU
on the SM. So that ratio would be 1/8. I have demonstrated this ratio on a simple
code that required little to no memory transfers.
ATI still provides more dp flops.
Craig
> So in both cases you get a pretty hefty jump if your application is single
> precision friendly.
>
> Of course such performance numbers are extremely application specific. I've
> seen performance increases published that are a good bit better (and worse)
> than the GFlop numbers would indicate. If you go to http://arxiv.org and type
> CUDA in as a search word there are 10 ish papers that talk about various uses.
>
> So basically it depends, either AMD, Intel, Nvidia, or ATI wins depending on
> your application. Of course there's other power efficient competition at
> well, atom, via nano[1], sci cortex (mips), bluegene, and the latest
> implmentation the PowerXCell 8i which is available in the QS22.
>
> Assuming you have source code, and parallel friendly applications there's
> quite a few options available. Ideally future benchmarks would include power,
> maybe add it as a requirement for future Spec benchmark submissions.
>
> [1] http://www.theinquirer.net/inquirer/news/1137366/dell-via-nano-servers
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Craig Tierney (craig.tierney at noaa.gov)
More information about the Beowulf
mailing list