[Beowulf] GPU Beowulf Clusters

Sat Jan 30 17:30:31 PST 2010

On 1/30/2010 2:52 PM, "C. Bergström" wrote:

> Hi Jon,
>
> I must emphasize what David Mathog said about the importance of the gpu
> programming model.

I don't doubt this at all. Fortunately, we have lots
of very smart people here at UC Berkeley. I have
the utmost confidence that they will figure this
stuff out. My job is to purchase and configure the
cluster.

> My perspective (with hopefully not too much opinion added)
> OpenCL vs CUDA - OpenCL is 1/10th as popular, lacks in features, more
> tedious to write and in an effort to stay generic loses the potential to
> fully exploit the gpu. At one point the performance of the drivers from
> Nvidia was not equivalent, but I think that's been fixed. (This does not
> mean all vendors are unilaterally doing a good job)

This is very interesting news. As far as I know, nobody is doing
anything with OpenCL in the College of Chemistry around here.
On the other hand, we've been following all the press about how
it's going to be the great unifier so that it won't be necessary
to use a proprietary API such as CUDA anymore. At this point it's too
early to doing anything with OpenCL until our colleagues in
the Computer Science department have made a pass at it and
have experiences to talk about.

> Have you considered sharing access with another research lab that has
> already purchased something similar?
> (Some vendors may also be willing to let you run your codes in exchange
> for feedback.)

There's nobody else at UC Berkeley I know of who has a GPU
cluster.

I don't know of any vendor who'd be willing to volunteer
their cluster. If anybody would like to volunteer, step
right up.

> 1) sw thread synchronization chews up processor time

Right, but let's say right now 80% of the CPU time is spent
in routines that will eventually be done in the GPU (I'm
just making this number up). I don't see how having a faster
CPU would help overall.

> 2) Do you already know if your code has enough computational complexity
> to outweigh the memory access costs?

In general, yes. A couple of grad students have ported some
of their code to CUDA with excellent results. Plus, molecular
dynamics is well suited to GPU programming, or so I'm told.
Several of the popular opensource MD packages have already
been ported also with excellent results.

> 3) Do you know if the GTX275 has enough vram? Your benchmarks will
> suffer if you start going to gart and page faulting

The one I mentioned in my posting has 1.8GB of RAM. If this isn't
enough then we're in trouble. The grad student I mentioned
has been using the 898MB version of this card without problems.

> 4) I can tell you 100% that not all gpu are created equally when it
> comes to handling cuda code. I don't have experience with the GTX275,
> but if you do hit issues I would be curious to hear about them.

I've heard that it's much better than the 9500GT that we first
started using. Since the 9500GT is a much cheaper card we didn't expect
much performance out of it, but the grad student who was trying
to use it said that there were problems with it not releasing memory,
resulting in having to reboot the host. I don't know the details.

> Some questions in return..
> Is your code currently C, C++ or Fortran?

The most important program for this group is in Fortran.
We're going to keep it in Fortran, but we're going to
write C interfaces to the routines that will run on
the GPU, and then write these routines in C.

> Is there any interest in optimizations at the compiler level which could
> benefit molecular dynamics simulations?

Of course, but at what price? I'm talking both about
both the price in dollars, and the price in non-standard
directives.

I'm not a chemist so I don't know what would speed up MD calculations
more than a good GPU.

Cordially,
-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu