[Beowulf] GP-GPU experience

Mon Apr 4 09:54:37 PDT 2011

If you are old enough to remember the time when the first distribute
computers appeared on the scene,
this is a deja-vu. Developers used to program on shared memory (
mostly with directives) were complaining
about the new programming models ( PVM, MPL, MPI).
Even today, if you have a serial code there is no tool that will make
your code runs on a cluster.
Even on a single system, if you try an auto-parallel/auto-vectorizing
compiler on a real code, your results will probably be disappointing.

When you can get a 10x boost on a production code rewriting some
portions of your code to use the GPU, if time to solution is important
or you could perform simulations that were impossible  before ( for
example using algorithms that were just too slow on CPUs,
Discontinuous Galerkin method is a perfect example), there are a lot
of developers that will write the code.
The effort it is clearly dependent of the code, the programmer and the
tool used ( you can go from fully custom GPU code with CUDA or OpenCL,
to automatically generated CUF kernels from PGI, to directives using
HMPP or PGI Accelerator).
In situation where time  to solution relates to money,  for example
oil and gas, GPUs are the answer today ( you will be surprised
by the number of GPUs in Houston).
Look at   the performance and scaling of AMBER ( MPI+ CUDA),
http://ambermd.org/gpus/benchmarks.htm, and tell me that the results
were not worth the effort.

Is GPU programming for everyone: probably not, in the same measure
that parallel programming in not for everyone.
Better tools will lower the threshold, but a threshold will be always present.

Massimiliano
PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran,
applications porting with CUDA, MPI+CUDA).

2011/4/4 "C. Bergström" <cbergstrom at pathscale.com>:
> Herbert Fruchtl wrote:
>> They hear great success stories (which in reality are often prototype
>> implementations that do one carefully chosen benchmark well), then look at the
>> API, look at their existing code, and postpone the start of their project until
>> they have six months spare time for it. And we know when that is.
>>
>> The current approach with more or less vendor specific libraries (be they "open"
>> or not) limits the uptake of GPU computing to a few hardcore developers of
>> experimental codes who don't mind rewriting their code every two years. It won't
>> become mainstream until we have a compiler that turns standard Fortran (or C++,
>> if it has to be) into GPU code. Anything that requires more change than let's
>> say OpenMP directives is doomed, and rightly so.
>>
> Hi Herbert,
>
> I think your perspective pretty much nails it
>
> (shameless self promotion)
> http://www.pathscale.com/ENZO (PathScale HMPP - native codegen)
> http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf
> http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source)
>
> This is really only the tip of the problem and there must also be
> solutions for scaling *efficiently* across the cluster.  (No MPI + CUDA
> or even HMPP is *not* the answer imho.)
>
> ./C
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>