[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Fri Nov 21 07:08:52 PST 2008

Hallo Jan,

On Fri, 2008-11-21 at 15:23 +0100, Jan Heichler wrote:
> Hallo Franz,
> 
> 
> Freitag, 21. November 2008, meintest Du:
> 
> 
> FM> That's simply not true. Every newer card from NVidia (that is,
> every
> 
> FM> G200-based card, right now, GTX260, GTX260-216 and GTX280)
> supports DP,
> 
> FM> and nothing indicates that NV will remove support in future cards,
> quite
> 
> FM> the contrary.
> 
> 
> FM> The distinction between Tesla and GeForce cards is that the former
> have
> 
> FM> no display output, they usually have more ram, and (but I'm not
> sure
> 
> FM> about this one) they are clocked a little lower.
> 
> 
> Don't forget that Teslas have ECC-RAM. Normal Graphic cards don't care
> about flipped memory bits. That does not count when processing DirectX
> or OpenGL - but it does for computation. So a highend GPU can
> miscalculate...

Ja, wirchlich... ;)

Yeah, that's an advantage I was forgetting about, and for cluster use,
or a multi-GPU system in a deskside computer, it could really matter... 

In order not to flood the list with answers, I'm gonna answer Mark here,
too :

On Fri, 2008-11-21 at 09:05 -0500, Mark Hahn wrote: 
> > and nothing indicates that NV will remove support in future cards, quite
> > the contrary.
> 
> hard to say.  NV is a very competitively driven company, that is, makes 
> decisions for competitive reasons.  it's a very standard policy to try
> to segment your market, to develop higher margin segments that depend 
> on restricted features.  certainly NV has done that before (hence the 
> existence of Quadro and Tesla) though it's not clear to me whether they 
> will have any meaningful success given the other players in the market.
> segmentation is a play for a dominant incumbent, and I don't think NV
> is or believes itself so.  AMD obviously seeks to avoid giving NV any
> advantage, and ATI has changed its outlook somewhat since AMDification.
> and Larrabee threatens to eat both their lunches.
> 
> > The distinction between Tesla and GeForce cards is that the former have
> > no display output, they usually have more ram, and (but I'm not sure
> > about this one) they are clocked a little lower.
> 
> both NV and ATI have always tried to segment "professional graphics"
> into a higher-margin market.  this involves tying the pro drivers to 
> features found only in the pro cards.

True, although, as far as I remember, the only real distinction between 
Quadro and GeForce cards are hardware support for antialiased lines which
is present in the former (I could be wrong though, and there may be some
more substantial differences)...

>   it's obvious that NV _could_ 
> do this with Cuda, though I agree they probably won't.
> 
> the original question was whether there is a strong movement towards 
> gp-gpu clusters.  I think there is not, because neither the hardware 
> nor software is mature.  Cuda is the main software right now, and is 
> NV-proprietary, and is unlikley to target ATI and Intel gp-gpu hardware.
> 
> finally, it needs to be said again: current gp-gpus deliver around 
> 1 SP Tflop for around 200W.  a current cpu (3.4 GHz Core2) delivers 
> about 1/10 as many flops for something like 1/2 the power.  (I'm 
> approximating cpu+nb+ram.)  cost for the cpu approach is higher (let's
> guess 2x, but again it's hard to isolate parts of a system.)
> 
> so we're left with a peak/theoretical difference of around 1 order of
> magnitude.  that's great!  more than enough to justify use of a unique
> (read proprietary, nonportable) development tool for some places where 
> GPUs work especially well (and/or CPUs work poorly).  and yes, adding
> gp-gpu cards to a cluster is a fairly modest price/power premium if 
> you expect to use it.
> 
> Joe's hmmer example sounds like an excellent example, since it shows good 
> speedup, and the application seems to be well-suited to gp-gpu strengths
> (and it has a fairly small kernel that needs to be ported to Cuda.)
> but comparing all the cores of a July 2008 GPU card to a single core on a
> 90-nm, n-3 generation chip really doesn't seem appropriate to me.

I think we can agree on all these points, although I'm sure Joe's
comparison, or, better, Joe's cpu used in the comparison has not been a
deliberate choice to somehow make the GPU version stand out more.

Regarding the proprietary-ness of CUDA, I would argue that being
proprietary also means that it probably better targets the NV GPU
architecture, and a more general, portable solution, like OpenCL (which
seems to be closer than expected, by the way) will possibly mean a
somewhat less optimal use of the GPU. Maybe I'm wrong, though, I guess
we will just have to wait a few more months to find out :)

I'm gonna get back to some real work now, have a good day,

F.

---------------------------------------------------------
Franz Marini
Prof. R. A. Broglia Theoretical Physics of Nuclei,
      Atomic Clusters and Proteins Research Group
Dept. of Physics, University of Milan, Italy.
email : franz.marini at mi.infn.it
phone : +39 02 50317226
---------------------------------------------------------