Parallel batch jobs on beowulf?

Justin Moore justin at cs.duke.edu
Mon Oct 1 15:04:00 PDT 2001


> We have a small research cluster at our CS dept., it's got 32 compute nodes.
> We run debian and the setup is a typical beowulf. (locally installed
> software, nfs, nis, mpich, lam, etc.)
>
> An instructor asked us whether it would be possible to run a parallel job
> system. I know a regular batch system (like pbs) could be used to that end,
> but what is the recommended way of providing parallel batch jobs on a Beowulf
> system?
>
> What he asked of course was the ability to allocate the whole cluster to a
> single job so that people can do benchmarks. Now, that is only useful while
> you're doing benchmarks and not necessary otherwise since development is
> usually done in an interactive manner and in case of severe resource
> conflicts developers can agree rather easily...
>
> Therefore, what do you think is the best practice in such research clusters?

   This isn't really Beowulf, but you may want to look into something like
Emulab (http://www.emulab.net/).  It's from U of Utah and allows research
groups to partition off chunks of hosts for limited periods of time.  On
the downside, it requires a good deal of overhead to set up.  In a similar
vein, there's www.rackspace.com and other companies.

   For a quick and dirty (free) solution, you could simply set up two PBS
queues.  Let the first be for dedicated jobs (don't run anything on that
queue, even if the current job isn't taking up all avaiable hosts), and
the second be for jobs that can run in parallel.  You'll have to hand-tune
the size of each sub-cluster, but as long as the "verbal locking" between
users works on the first queue, it should be relatively easy to set up.

-jdm

Department of Computer Science, Duke University, Durham, NC 27708-0129
Email:  justin at cs.duke.edu





More information about the Beowulf mailing list