[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Mark Hahn hahn at mcmaster.ca
Mon Jun 16 20:32:00 PDT 2008

> That's rather surprising. MD5 is a pure integer algorithm, and is well
> known for being unfriendly to vectorization. There is also extensive

the quoted performance was throughput, not latency, so perhaps 
they were simply doing a bunch of md5's at once.

>> * GPU (single GeForce 8800 Ultra on cylon):
>>    57,640,967.264473 hash/second
> ...that implies moving at least 3.7e9 bytes of data (MD5
> operates on blocks of 64 bytes) into the GPU per second, entirely

the only comparable number I've heard quoted was in parallel
rendering/compositing, where fairly recent systems bragged about
reading back the framebuffer at 2 GB/s.

