[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Håkon Bugge hbugge at platform.com
Fri Nov 21 00:55:05 PST 2008


Mark,

Guess you're too humble ;-)

At 17:23 20.11.2008, Mark Hahn wrote:
>I'm happy for you, but to me, you're stacking 
>the deck by comparing to a quite old CPU.  you 
>could break out the prices directly, but comparing 3x
>GPU (modern?  sounds like pci-express at least) 
>to a current entry-level cluster node (8 
>core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.
>
>at the VERY least, honesty requires comparing one GPU against all the cores
>in a current CPU chip.  with your numbers, I 
>expect that would change the speedup from 117 to 
>around 15.  still very respectable.

I compiled the serial hmm version using the 
default make file (gcc -O2 -g) and ran it on an 
Opetron 2220 (2.8 GHz). Then I compiled the MPI 
version using Intel compiler 10.1 (icc -axS -O3), 
and ran it on a not-yet-to-be-released two socket 
machine using 16 MPI process. The latter ran 145x 
times faster. So soon, the 15x is below 1x...

So, YMWV!



Håkon







More information about the Beowulf mailing list