> I'm happy for you, but to me, you're stacking the deck by comparing to a
> quite old CPU.  you could break out the prices directly, but comparing 3x
> GPU (modern?  sounds like pci-express at least)

Mark, all CUDA capable cards are PCI-Express. (off the top of my head).

> to a current entry-level cluster node (8 core2/shanghai cores at 2.4-3.4
> GHz) be more appropriate.
> at the VERY least, honesty requires comparing one GPU against all the cores
> in a current CPU chip.  with your numbers, I expect that would change the
> speedup from 117 to around 15.  still very respectable.
> I apologize for not RTFcode, but does the host version of hmmer you're
> comparing with vectorize using SSE?

Good question. I'd really like to see the numbers on this one also.
As is clear to the list, I'm really enthusuastic about CUDA. But as you say
no point in that if your compiler/application
could make equally good use of current on-chip SSE (etc. etc.)

(Currently sitting in an Altix Performance and Tuning class, and my head is
spinning with this stuff.
Pun very much intended.)
