[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Prentice Bisbal prentice at ias.edu
Mon Jun 16 08:38:44 PDT 2008

Vincent Diepeveen wrote:
> That has to change in order to get GPU calculations more into mainstream.
> When i calculate on paper for some applications, a GPU can be potentially
> factor 4-8 faster than a standard quadcore 2.4ghz is right now.
> Getting that performance out of the GPU is more than a fulltime task
> however,
> without having indepth technical hardware data on the GPU.

Completely untrue. One of my colleagues, who does a lot of work with GPU
 processors for astrophysics calculations, was able to increase the
performance of the MD5 algorithm by ~100x with about 1.5 days of work.
He called this this code that he wrote "(totally unoptimized, a straight
CUDA C implementation of Rivest's algorithm". He tinkered some more,
adding some optimizations, and I believe he ended up with 350x
performance improvement.

Here, I quote his e-mail on his first round of coding that he sent me:


 The other day in NYC on HPC-UG meeting someone mentioned that GPUs
would be perfect for password cracking, with which I wholeheartedly
agreed (on theoretical grounds). But theory is nothing without
experiment   :)  , so I spent the last night and this morning writing a
GPU MD5 hash routine (totally unoptimized, a straight CUDA C
implementation of Rivest's algorithm).

    The results?

* GPU (single GeForce 8800 Ultra on cylon):

    57,640,967.264473 hash/second

* The same algorithm on the CPU (Intel(R) Core(TM)2 Quad CPU  Q6700  @
2.66GHz on cylon):

    543,839.652381 hash/second

A factor of ~100 difference. Sweet.

Another point of comparison: the fastest, assembly-level optimized x86
MD5 code, running on a _dual_ 3.2 GHz Xeon (see
http://c3rb3r.openwall.net/mdcrack/) can do 42e6 hash/sec. And remember,
I wrote the CUDA code in a day and a half, with _no_ optimization. Nice.

In another words, one GPU card with an amateurishly written MD5 code can
brute-force crack an 8-character MD5 hashed password consisting of
[0-9A-Za-z] in about 6 weeks. Now imagine if someone who knew what they
were doing optimized the code, and got a cluster of Tesla's instead of a
single gaming card that I used....

Cool  :-)  .



More information about the Beowulf mailing list