[Beowulf] GPU question

Mon Aug 31 06:15:07 PDT 2009

On Mon, Aug 31, 2009 at 7:40 AM, Jonathan
Aquilina<eagles051387 at gmail.com> wrote:
> One thing that has yet to be mentioned is what kind of gpu are we talking
> about. depending on the problem would tesla gpu's, if you are building the
> cluster from scratch, be better for a gpu based cluster as they are meant
> for high performance computing?

Tesla (10 series) solutions have a big bunch of memory (4GB per GPU)
and no graphics card component. In terms of FLOPS the geforce 2xx
series are also great (but with less memory per GPU). The tesla C1060
that I work with generates massive heat (they need 200W per card to
work), so that is an issue to care about (I have a cold - ~16Celsius -
air flow at front of the PC).

The 1070 is a 4 GPU "all in one" 1U blade, so I guess it's probably
the optimal solution from a management point of view (don't forget the
host PC too). The problem with too much data within the GPU (total
16GB) is the bootlenecks (at PCI-x bus) you may have if you need to
download big bunches of data frequently or if the code in GPU A is
supposed to interact with GPU B/C/D.

One thing that's not mentioned out loud by NVIDIA (I have read only in
CUDA programming manual) is that if the video system needs more memory
that's not available(say you change resolution, while you're waiting
for your process to finish), it will crash your cuda app, so I advise
you to use  a second card to display (if you have a tesla solution,
you certainly  have a "second" display card). If you are running
remotly, this i  an non issue (framebuffers don't need much memory
neither change resolution).

Gil Brandao