[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Perry E. Metzger perry at piermont.com
Mon Jun 16 18:48:55 PDT 2008


Prentice Bisbal <prentice at ias.edu> writes:
> Completely untrue. One of my colleagues, who does a lot of work with GPU
> processors for astrophysics calculations, was able to increase the
> performance of the MD5 algorithm by ~100x with about 1.5 days of work.

That's rather surprising. MD5 is a pure integer algorithm, and is well
known for being unfriendly to vectorization. There is also extensive
work by Keromytis et al on the use of GPUs for accelerating
cryptographic operations, and I don't think they achieved anything
like that sort of performance improvement.

I'll point out, by the way...

>* GPU (single GeForce 8800 Ultra on cylon):
>    57,640,967.264473 hash/second

...that implies moving at least 3.7e9 bytes of data (MD5
operates on blocks of 64 bytes) into the GPU per second, entirely
ignoring the 64 Feistel rounds within the GPU. Each round is 4 xors
and a rotate, and they can't be done in parallel, so we get a total of
about 1.8e10 integer ops (entirely ignoring the world shuffling) per
second. That's... rather a lot.

Perry
-- 
Perry E. Metzger		perry at piermont.com



More information about the Beowulf mailing list