[Beowulf] SGE + policy

Sean Dilda sean at duke.edu
Thu May 27 11:44:17 PDT 2004


On Thu, 2004-05-27 at 11:13, Robert G. Brown wrote:
> On Thu, 27 May 2004, Orion Poplawski wrote:
> 
> > You can also setup subordinate queues where the queue with priority will 
> > only accept jobs that will take less than a given time to run.  When a 
> > short job gets submitted, a long job on the subordinate queue will be 
> > stopped (SIGSTOP) while the short jobs runs.  Your problem here is that 
> > the long job will presumably still hold a license.  If matlab has some 
> > kind of checkpointing function, you could tie that into SGE to release 
> > the license.
> 
> So it WILL manage things with SIGSTOP/SIGCONT.  Good, that's what I was
> hoping.  I'll search the docs etc for "subordinate queues" and signals
> to see if I can figure out how.

We were using subordinate queues for a while with the "owned" nodes in
the CSEM cluster.  Unfortunately, it turns out that subordinate queues
and MPI jobs don't play well together, but I don't think Econ is
planning on doing much with MPI jobs.

One thing you might want to consider is adding a complex to the "high
priority" queue.  This is just a resource that can be requested in order
to get a job submitted into the high priority queue.  If you set the
complex's requestable flag to FORCED, then it'll only put jobs in those
queues if they specifically request the complex.  You may also want to
look into resource limits.  You can place these on the "high priority"
queues so that if jobs run for too long, they will be killed.  Note, I
haven't used resource limits, so I'm not sure how well they work.  Based
on some of my experience with SGE, I think it might be possible to get
around the kill signal from them, but I haven't played with it enough to
be certain.

Also, if you look at http://gridengine.sunsource.net/, they have a users
list for SGE that might also be a good source of info for you.


Sean
 




More information about the Beowulf mailing list