[Beowulf] GPU Beowulf Clusters

Thu Jan 28 12:57:05 PST 2010

This is not a problem in your setup as you are assigning a whole node
together.  In general how one can deal with problem of binding a
particular gpu device to scheduler?

Sorry if I am asking something which is already known and there are ways
to bind the devices within scheduler. 

Thanks,
TV

-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Michael Di Domenico
Sent: Thursday, January 28, 2010 9:54 AM
To: Beowulf Mailing List
Subject: Re: [Beowulf] GPU Beowulf Clusters

The way I do it is, but your mileage may vary...

We allocate two CPU's per GPU and use the Nvidia Tesla S1070 1U
chassis product.

So a standard quad/core - dual/socket server with four GPU's attached

We've found that even though you expect the GPU to do most of the
work, it really takes a CPU to drive the GPU and keep it busy

Having a second CPU to pre-stage/post-stage the memory has worked
pretty well also.

For scheduling, we use SLURM and allocate one entire node per job, no
sharing

On Thu, Jan 28, 2010 at 12:38 PM, Jon Forrest <jlforrest at berkeley.edu>
wrote:
> I'm about to spend ~$20K on a new cluster
> that will be a proof-of-concept for doing
> GPU-based computing in one of the research
> groups here.
>
> A GPU cluster is different from a traditional
> HPC cluster in several ways:
>
> 1) The CPU speed and number of cores are not
> that important because most of the computing will
> be done inside the GPU.
>
> 2) Serious GPU boards are large enough that
> they don't easily fit into standard 1U pizza
> boxes. Plus, they require more power than the
> standard power supplies in such boxes can
> provide. I'm not familiar with the boxes
> that therefore should be used in a GPU cluster.
>
> 3) Ideally, I'd like to put more than one GPU
> card in each computer node, but then I hit the
> issues in #2 even harder.
>
> 4) Assuming that a GPU can't be "time shared",
> this means that I'll have to set up my batch
> engine to treat the GPU as a non-sharable resource.
> This means that I'll only be able to run as many
> jobs on a compute node as I have GPUs. This also means
> that it would be wasteful to put CPUs in a compute
> node with more cores than the number GPUs in the
> node. (This is assuming that the jobs don't do
> anything parallel on the CPUs - only on the GPUs).
> Even if GPUs can be time shared, given the expense
> of copying between main memory and GPU memory,
> sharing GPUs among several processes will degrade
> performance.
>
> Are there any other issues I'm leaving out?
>
> Cordially,
> --
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlforrest at berkeley.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf