Queueing problem

Drake Diedrich Drake.Diedrich at anu.edu.au
Wed Jan 16 17:22:44 PST 2002

On Wed, Jan 16, 2002 at 01:22:47PM +0100, Peter H. Koenig wrote:
> Hello,
> recently we acquired new machines we want to integrate into our
> computational workforce. We are currently using a DQS complex (A) of
> alpha-workstations.
> The new machines are integrated into two complexes:
> (B) a beowulf-style cluster of Linux-PC including a headnode mainly for
> parallel applications and development
> (C) a pool of workstations for a (student-) computer lab, which can be
> used for short calculations
> We are also planning on investing in a further cluster (D) which may be
> open for other groups.
> Since the user base for each of the complexes (except for A and B) is
> different we think that we might need to separate the complexes.

   Sounds like it.  Are student's also submitting batch jobs, or only
running interactive jobs?  If the goal is just to use the idle cycles on the
students' interactive machines, having qidle start up at login may be all
that is necessary to keep batch jobs from interfering with their work.
Setting the priority to automatically nice all jobs, and
load_masg/load_alarm to discourage scheduling on the C nodes when there are
A/B nodes available should also help reduce impact on them even when qidle
isn't running.

   If both AB and C users are submitting jobs, and you want to give each
lower priority to the other's queues, you can do that by putting two queues
on each node, one for the AB users and one for the C users.  One should be
subordinated to the other, so that it's jobs suspend when someone with
greater priority on that set of nodes queues a job.  You'll want lots of
swap so there's no memory impact from suspended jobs.  user_acls or REQUIRED
resources can be used to limit jobs to the allowed queue on each node.  I
suppose using a consumable resource for the student computers could limit
jobs to a certain fraction of the C nodes (never used them myself), but if
you're suspending completely when students are using the C nodes I see no
reason not to queue jobs on all low priority C queues at once.

   I'd restrict parallel jobs to just the B nodes though, with a resource
specified in their queues that I'd strongly encourage all parallel users to
set when queueing their jobs, otherwise a single suspended C-node in a large
parallel job could suspend the entire job, while still tying up many B-nodes
until the student finishes and the parallel computation can continue.

> The jobs are to be submitted on the workstations (A) and routed to the
> appropriate queue for execution. The submission and routing of jobs
> should be possible with least involvement of the user. It should be
> possible to restrict routing to other complexes to certain rules e.g.
> routing to the computer lab should only be possible if a given
> percentage of the queues there is idle (for allowing local submissions
> of jobs, which should start without larger delays).

> As far as I understand the documentation, DQS _does_ allow routing to
> other complexes, but I have neither seen any information on how this can
> be accomplished nor on whether rules for routing can be specified.

   There's the intercell routing, but I've never known anyone to use it.  If
you have tight enough ties between cells that users and files are the same
and jobs are likely to be portable, there seems little point in running
separate qmaster's.

> Can this be accomplished transparently to the user ? Can someone point
> me to a queuing software which allows the specification of such rules
> (even if this means quitting DQS)?

   With more work, you could specify required resources on all queues, and
have your own userspace code that runs through orphaned jobs and qalters
their requirements to end up in the appropriate nodes.  This might get you
exactly what you want, but I don't know of any examples.

   Generic NQS had routing queues that could probably have done this all
more naturally, but didn't support multinode jobs (just SMP jobs). Not sure
how easy PBS plug-in schedulers are to write, or what enhancements SGE has
added to DQS yet.


More information about the Beowulf mailing list