[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
Prentice Bisbal
prentice at ias.edu
Mon Jun 16 08:38:44 PDT 2008
Vincent Diepeveen wrote:
>
> That has to change in order to get GPU calculations more into mainstream.
>
> When i calculate on paper for some applications, a GPU can be potentially
> factor 4-8 faster than a standard quadcore 2.4ghz is right now.
>
> Getting that performance out of the GPU is more than a fulltime task
> however,
> without having indepth technical hardware data on the GPU.
Completely untrue. One of my colleagues, who does a lot of work with GPU
processors for astrophysics calculations, was able to increase the
performance of the MD5 algorithm by ~100x with about 1.5 days of work.
He called this this code that he wrote "(totally unoptimized, a straight
CUDA C implementation of Rivest's algorithm". He tinkered some more,
adding some optimizations, and I believe he ended up with 350x
performance improvement.
Here, I quote his e-mail on his first round of coding that he sent me:
<quote>
The other day in NYC on HPC-UG meeting someone mentioned that GPUs
would be perfect for password cracking, with which I wholeheartedly
agreed (on theoretical grounds). But theory is nothing without
experiment :) , so I spent the last night and this morning writing a
GPU MD5 hash routine (totally unoptimized, a straight CUDA C
implementation of Rivest's algorithm).
The results?
* GPU (single GeForce 8800 Ultra on cylon):
57,640,967.264473 hash/second
* The same algorithm on the CPU (Intel(R) Core(TM)2 Quad CPU Q6700 @
2.66GHz on cylon):
543,839.652381 hash/second
A factor of ~100 difference. Sweet.
Another point of comparison: the fastest, assembly-level optimized x86
MD5 code, running on a _dual_ 3.2 GHz Xeon (see
http://c3rb3r.openwall.net/mdcrack/) can do 42e6 hash/sec. And remember,
I wrote the CUDA code in a day and a half, with _no_ optimization. Nice.
In another words, one GPU card with an amateurishly written MD5 code can
brute-force crack an 8-character MD5 hashed password consisting of
[0-9A-Za-z] in about 6 weeks. Now imagine if someone who knew what they
were doing optimized the code, and got a cluster of Tesla's instead of a
single gaming card that I used....
Cool :-) .
</quote>
--
Prentice
More information about the Beowulf
mailing list