[Beowulf] OT? GPU accelerators for finite difference time domain
Massimiliano Fatica
mfatica at gmail.com
Sun Apr 1 17:53:52 PDT 2007
On 4/1/07, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> I assume this is only single-precision, and I would guess that for
> numerical stability, you must be limited to fairly short fft's.
> what kind of peak flops do you see? what's the overhead of shoving
> data onto the GPU, and getting it back? (or am I wrong that the GPU
> cannot do an FFT in main (host) memory?
I will run some benchmark in the next days ( I usually do more than
just an FFT).
I remember some numbers for SGEMM (real SGEMM C=alphaA*B+beta*C), 120
Gflops on board, 80 Gflops measured from the host (with all the I/O
overhead) , for N=2048.
>
> > You can offload only compute intendive parts of your code to the GPU
> > from C and C++ ( writing a wrapper from Fortran should be trivial).
>
> sure, but what's the cost (in time and CPU overhead) to moving data
> around like this?
It depends on your chipset and from other details ( cold access, data
in cache, pinned memory): it goes from around 1GB/s to 3GB/s.
>
> > The current generation of the hardware supports only single precision,
> > but there will be a double precision version towards the end of the
> > year.
>
> do you mean synthetic doubles? I'm guessing that the hardware isn't
> going to gain the much wider multipliers necessary to support doubles
> at the same latency as singles...
>
Can't comment on this one..... :-)
Massimiliano
More information about the Beowulf
mailing list