[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Mark Hahn hahn at mcmaster.ca
Fri Nov 21 06:05:54 PST 2008

>>> Virtually any recent card can run CUDA code. If you Google you can get a
>>> list of compatible cards.
>> not that many NVidia cards support DP yet though, which is probably
>> important to anyone coming from the normal HPC world...  there's some
>> speculation that NV will try to keep DP as a market segmentation
>> feature to drive HPC towards high-cost Tesla cards, much as vendors
>> have traditionally tried to herd high-end vis into 10x priced cards.
> That's simply not true. Every newer card from NVidia (that is, every

which part is not true?  the speculation?  OK - speculation is always 
just speculation.  it _is_ true that only the very latest NV generation,
essentially three bins of one card, does support DP.

> and nothing indicates that NV will remove support in future cards, quite
> the contrary.

hard to say.  NV is a very competitively driven company, that is, makes 
decisions for competitive reasons.  it's a very standard policy to try
to segment your market, to develop higher margin segments that depend 
on restricted features.  certainly NV has done that before (hence the 
existence of Quadro and Tesla) though it's not clear to me whether they 
will have any meaningful success given the other players in the market.
segmentation is a play for a dominant incumbent, and I don't think NV
is or believes itself so.  AMD obviously seeks to avoid giving NV any
advantage, and ATI has changed its outlook somewhat since AMDification.
and Larrabee threatens to eat both their lunches.

> The distinction between Tesla and GeForce cards is that the former have
> no display output, they usually have more ram, and (but I'm not sure
> about this one) they are clocked a little lower.

both NV and ATI have always tried to segment "professional graphics"
into a higher-margin market.  this involves tying the pro drivers to 
features found only in the pro cards.  it's obvious that NV _could_ 
do this with Cuda, though I agree they probably won't.

the original question was whether there is a strong movement towards 
gp-gpu clusters.  I think there is not, because neither the hardware 
nor software is mature.  Cuda is the main software right now, and is 
NV-proprietary, and is unlikley to target ATI and Intel gp-gpu hardware.

finally, it needs to be said again: current gp-gpus deliver around 
1 SP Tflop for around 200W.  a current cpu (3.4 GHz Core2) delivers 
about 1/10 as many flops for something like 1/2 the power.  (I'm 
approximating cpu+nb+ram.)  cost for the cpu approach is higher (let's
guess 2x, but again it's hard to isolate parts of a system.)

so we're left with a peak/theoretical difference of around 1 order of
magnitude.  that's great!  more than enough to justify use of a unique
(read proprietary, nonportable) development tool for some places where 
GPUs work especially well (and/or CPUs work poorly).  and yes, adding
gp-gpu cards to a cluster is a fairly modest price/power premium if 
you expect to use it.

Joe's hmmer example sounds like an excellent example, since it shows good 
speedup, and the application seems to be well-suited to gp-gpu strengths
(and it has a fairly small kernel that needs to be ported to Cuda.)
but comparing all the cores of a July 2008 GPU card to a single core on a
90-nm, n-3 generation chip really doesn't seem appropriate to me.

More information about the Beowulf mailing list