[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Chris Samuel csamuel at vpac.org
Sun May 25 04:31:15 PDT 2008

----- "Ricardo Reis" <rreis at aero.ist.utl.pt> wrote:

> I've read somewhere that double precision performance from AMD wasn't
> very good

SP is about 500 GFlops and DP is apparently about 100-250 GFlops:


# AMD's Dave "Wavey" Baumann (of ex-Beyond3D fame) told us that
# while AMD's RV670 chip is supporting double-precision units,
# it does not feature individual units for FP64, but uses the
# FP32 units to do FP64 calculations over a number of cycles.
# And yes, this process takes time. Depending on complexity of
# operation, the best case scenario is around half the original
# SP FP32 performance about 250 GFLOPs; in a worst case, the
# performance should be about a quarter of its FP32 performance
# - or about 125 GFLOPs. Dave told us that the chip usually
# averages out somewhere in between, which is actually quite a
# feat for a chip that does not feature native FP64 units.

> and their programming model goes more towards assembly...

Not looked into that, but I do believe that AMD are working
to support Firestream offloading in the maths library (ACML).

> Besides,  AMD/ATI still have to convince on their linux drivers.

I'd much rather see the specs released than drivers
(both in an ideal world), and they do seem to be going
the right way with that at the moment.

Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency

More information about the Beowulf mailing list