[Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists"

Thu Jun 19 17:16:41 PDT 2008

On Thursday 19 June 2008 04:32:11 pm Chris Samuel wrote:
> ----- "Kilian CAVALOTTI" <kilian at stanford.edu> wrote:
> > AFAIK, the multi GPU Tesla boxes contain up to 4 Tesla processors,
> > but are hooked to the controlling server with only 1 PCIe link,
> > right? Does this spell like "bottleneck" to anyone?
>
> The nVidia website says:
>
> http://www.nvidia.com/object/tesla_tech_specs.html
>
> # 6 GB of system memory (1.5 GB dedicated memory per GPU)

The latest S1070 has even more than that: 4GB per GPU as it seems, 
according to [1].

But I think this refers to the "global memory", as decribed in [1] 
(slide 12, "Kernel Memory Access"). It's the graphics card main memory, 
the kind of one which is used to store textures in games, for instance. 
Each GPU core also has what they call "shared memory" and which is 
really only shared between threads on the same core (it's more like a 
L2 cache actually).

> So my guess is that you'd be using local RAM not the
> host systems RAM whilst computing.

Right, but at some point, you do need to transfer data from the host 
memory to the GPU memory, and back. That's where there's probably a 
bottleneck if all 4 GPUs want to read/dump data from/to the host at the 
same time.

Moreover, I don't think that the different GPUs can work together, ie. 
exchange data and participate to the same parallel computation. Unless 
they release something along the lines of a CUDA-MPI, those 4 GPUs 
sitting in the box would have to be considered as independent 
processing units. So as I understand it, the scaling benefits from your 
application's parallelization would be limited to one GPU, no matter 
how many you got hooked to your machine.

I don't even know how you choose (or even if you can choose) on which 
GPU you want your code to be executed. It has to be handled by the 
driver on the host machine somehow.

> There's a lot of fans there..

They probably get hot. At least the G80 do. They say "Typical Power 
Consumption: 700W" for the 4 GPUs box. Given that a modern gaming rig 
featuring a pair of 8800GTX in SLI already requires a 1kW PSU, I would 
put this on the optimistic side.

[1]http://www.nvidia.com/object/tesla_s1070.html
[2]http://www.mathematik.uni-dortmund.de/~goeddeke/arcs2008/C1_CUDA.pdf

Cheers,
-- 
Kilian