NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?

Tue Jun 17 11:53:08 PDT 2008

On Mon, Jun 16, 2008 at 06:58:40PM -0700, Bill Broadley wrote:

> Heh.  Is there a published linpack for some CUDA based solution?  Or 
> possibly code available?

http://www.hp.com/techservers/hpccn/hpccollaboration/ADCatalyst/downloads/accelerating-HPCUsing-GPUs.pdf

says that some folks at Berkeley wrote some SGEMM code and that it
achieves ~ 165 Gflops out of a 4-GPU Tesla setup. That's waaaaay down
from the alleged peak of that box.

> I'm pretty sure hypertransport allows for a significant number of 
> outstanding memory transactions, so even a single gpu/cpu hybrid could farm 
> out a 100GB/sec memory system to numerous sockets.... sounds like a good 
> justification for HT3 to me.

Hypertransport is just a bus, it's the northbridge+cpu that determines
how many outstanding transactions you can have.

-- greg