[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Perry E. Metzger perry at piermont.com
Mon Jun 16 19:06:49 PDT 2008


"Perry E. Metzger" <perry at piermont.com> writes:
> ...that implies moving at least 3.7e9 bytes of data (MD5
> operates on blocks of 64 bytes) into the GPU per second, entirely
> ignoring the 64 Feistel rounds within the GPU. Each round is 4 xors
> and a rotate, and they can't be done in parallel, so we get a total of
> about 1.8e10 integer ops (entirely ignoring the world shuffling) per
> second. That's... rather a lot.

By the way, as an aside, dedicated IPSec hardware can keep up with
doing HMAC-MD5 at gigabit ethernet speeds -- I don't think anyone has
shown hardware capable of doing HMAC-MD5 faster than 10G
ethernet. (I'm not even sure anyone has hardware that will keep up on
10GigE). Your friend is claiming he can do faster -- about 30Gbit/sec
-- beating custom hardware optimized purely for doing MD5. That would
clearly be of a lot of interest to many people if it were true.

Perry



More information about the Beowulf mailing list