[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Håkon Bugge hbugge at platform.com
Fri Nov 21 00:55:05 PST 2008


Guess you're too humble ;-)

At 17:23 20.11.2008, Mark Hahn wrote:
>I'm happy for you, but to me, you're stacking 
>the deck by comparing to a quite old CPU.  you 
>could break out the prices directly, but comparing 3x
>GPU (modern?  sounds like pci-express at least) 
>to a current entry-level cluster node (8 
>core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.
>at the VERY least, honesty requires comparing one GPU against all the cores
>in a current CPU chip.  with your numbers, I 
>expect that would change the speedup from 117 to 
>around 15.  still very respectable.

I compiled the serial hmm version using the 
default make file (gcc -O2 -g) and ran it on an 
Opetron 2220 (2.8 GHz). Then I compiled the MPI 
version using Intel compiler 10.1 (icc -axS -O3), 
and ran it on a not-yet-to-be-released two socket 
machine using 16 MPI process. The latter ran 145x 
times faster. So soon, the 15x is below 1x...



