[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Joe Landman landman at scalableinformatics.com
Thu Nov 20 07:43:15 PST 2008

Quick intervention from SC08 show

Mark Hahn wrote:
>> As we know by now GPUs can run some problems many times faster than CPUs
> it's good to cultivate some skepticism.  the paper that quotes 40x
> does so with a somewhat tilted comparison.  (I consider this comparison
> fair: a host with 2x 3.2 GHz QC Core2 vs 1 current high-end CPU card.
> former delivers 102.4 SP Gflops; latter is something like 1.2 Tflop.
> those are all peak/theoretical.  the nature of the problem determines
> how much slower real workloads are - I suggest that as not-suited-ness
> increases, performance falls off _faster_ for the GPU.)

Not always.

[shameless plug]

A project I have spent some time with is showing 117x on a 3-GPU machine 
over a single core of a host machine (3.0 GHz Opteron 2222).  The code 
is mpihmmer, and the GPU version of it.  See http://www.mpihmmer.org for 
more details.  Ping me offline if you need more info.

[/shameless plug]

>> what I understand GPUs are useful only with certain classes of numerical
>> problems and discretization schemes, and of course the code must be
> I think it's fair to say that GPUs are good for graphics-like loads,

... not entirely true.  We are seeing good performance with a number of 
calculations that share similar features.  Some will not work well on 
GPUs, those with lots of deep if-then or conditional constructs.  If you 
can refactor these such that the conditionals are hoisted out of the 
inner loops, this is a good thing for GPUs.

> or more generally: fairly small data, accessed data-parallel or with 
> very regular and limited sharing, with high work-per-data.

... not small data.  You can stream data.  Hi work per data is advisable 
on any NUMA like machine with penalties for data motion (cache based 
architectures, NUMA, MPI, ...).  You want as much data reuse as you can 
get, or to structure the stream to leverage the maximum bandwidth.


>> than others? Given the very substantial speed improvements with GPUs,
>> will there be a movement to GPU clusters, even if there is a substantial
>> cost in problem reformulation?  Or are GPUs only suitable for a rather
>> narrow range of numerical problems?
> GP-GPU tools are currently immature, and IMO the hardware probably needs 
> a generation of generalization before it becomes really widely used.

Hrmm...  Cuda is pretty good.  Still needs some polish, but people can 
use it, and are generating real apps from it.  We are seeing pretty wide 
use ... I guess the issue is what one defines as "wide".

> OTOH, GP-GPU has obviously drained much of the interest away from eg
> FPGA computation.  I don't know whether there is still enough interest

There is still some of it on the show floor.  Some things FPGAs do very 
well.  But the cost for this performance has been prohibitive, and GPUs 
are basically decimating the business model that has been in use for 

> in vector computers to drain anything...

Hmmm.... There is a (micro)vector machine in your CPU anyway.


> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list