[Beowulf] OT? GPU accelerators for finite difference time domain

Massimiliano Fatica mfatica at gmail.com
Sun Apr 1 12:30:32 PDT 2007

CUDA comes with a full BLAS and FFT library (for 1D,2D and 3D transforms).
You can have relevant speed up  even for 2D transforms or for a batch of 1Ds.

You can offload only compute intendive parts of your code to the GPU
from C and C++ ( writing a wrapper from Fortran should be trivial).

The current generation of the hardware supports only single precision,
but there will be a double precision version towards the end of the

PS: I work on CUDA at Nvidia, so I may be a little biased...

On 4/1/07, Mark Hahn <hahn at mcmaster.ca> wrote:
> as far as I know, there are not any well-developed libraries which simply
> harness whatever GPU you provide, but don't require your whole program to
> be GPU-ized.  the cost of sharing data with a GPU is significant, but
> blas-3 might have a high enough work-to-size ratio to make it feasible.
> 3d fft's might also be expressible in GPU-friendly terms (the trick would
> be to utilize not fight the GPU's inherent memory-access preferences.)
> perhaps some MCMC stuff might be SIMD-able?  I doubt that sequence analysis
> would make much sense, since GPUs are not well-tuned to access host memory,
> and sequence programs are not actually that compute-intensive.  I'd guess
> that anything involving sparse matrices would be difficult to do on a GPU.

More information about the Beowulf mailing list