[Beowulf] OT? GPU accelerators for finite difference time domain

Sun Apr 1 17:53:52 PDT 2007

On 4/1/07, Mark Hahn <hahn at mcmaster.ca> wrote:

>
> I assume this is only single-precision, and I would guess that for
> numerical stability, you must be limited to fairly short fft's.
> what kind of peak flops do you see?  what's the overhead of shoving
> data onto the GPU, and getting it back?  (or am I wrong that the GPU
> cannot do an FFT in main (host) memory?

I will run some benchmark in the next days ( I usually do more than
just an FFT).
I remember some numbers for SGEMM (real SGEMM C=alphaA*B+beta*C), 120
Gflops on board, 80 Gflops measured from the host (with all the I/O
overhead) , for N=2048.

>
> > You can offload only compute intendive parts of your code to the GPU
> > from C and C++ ( writing a wrapper from Fortran should be trivial).
>
> sure, but what's the cost (in time and CPU overhead) to moving data
> around like this?

It depends on your chipset and from other details ( cold access, data
in cache, pinned memory): it goes from around 1GB/s to 3GB/s.

>
> > The current generation of the hardware supports only single precision,
> > but there will be a double precision version towards the end of the
> > year.
>
> do you mean synthetic doubles?  I'm guessing that the hardware isn't
> going to gain the much wider multipliers necessary to support doubles
> at the same latency as singles...
>

Can't comment on this one..... :-)

Massimiliano