[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters
Håkon Bugge
hbugge at platform.com
Fri Nov 21 00:55:05 PST 2008
Mark,
Guess you're too humble ;-)
At 17:23 20.11.2008, Mark Hahn wrote:
>I'm happy for you, but to me, you're stacking
>the deck by comparing to a quite old CPU. you
>could break out the prices directly, but comparing 3x
>GPU (modern? sounds like pci-express at least)
>to a current entry-level cluster node (8
>core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.
>
>at the VERY least, honesty requires comparing one GPU against all the cores
>in a current CPU chip. with your numbers, I
>expect that would change the speedup from 117 to
>around 15. still very respectable.
I compiled the serial hmm version using the
default make file (gcc -O2 -g) and ran it on an
Opetron 2220 (2.8 GHz). Then I compiled the MPI
version using Intel compiler 10.1 (icc -axS -O3),
and ran it on a not-yet-to-be-released two socket
machine using 16 MPI process. The latter ran 145x
times faster. So soon, the 15x is below 1x...
So, YMWV!
Håkon
More information about the Beowulf
mailing list