[Beowulf] scheduler policy design

Thu Apr 26 02:20:58 PDT 2007

On 26 Apr 2007, at 10:06 am, Toon Knapen wrote:

> Tim Cutts wrote:
>> The compromise we ended up with is this set of LSF queues on our  
>> system (a cluster with about 1500 job slots):
>> QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS   
>> PEND   RUN  SUSP
>> yesterday       500  Open:Active     200   10    -    -     1      
>> 0     1     0
>> normal           30  Open:Active       -    -    -    -   281    
>> 110   171     0
>> hugemem          30  Open:Active       -    -    -    -     3      
>> 0     3     0
>> long              3  Open:Active       -    -    -    -  4022   
>> 2987  1035     0
>> basement          1  Open:Active     300  200    -    -   127      
>> 0   127     0
>> yesterday:
>> a special purpose high priority queue for the "I need it  
>> yesterday" crowd.  No run length limits, but very limited in terms  
>> of how many slots the user can use.
>
>
> Do you have slot reserved exclusively for the 'yesterday' queue or  
> to any of the other queue's ?

No, yesterday is just the highest priority queue, so when a slot  
comes available anywhere yesterday jobs tend to get it.  Given the  
number of job slots we have (more than 1,500) and the various limits  
that are in place, the pathological corner cases which would stop a  
yesterday job getting onto the system within a couple of minutes are  
pretty rare (and in fact I have not yet seen it happen).  Even if the  
system were full of jobs running for the full 24 hour maximum, you'd  
get a node coming free on average every minute or so.  I should point  
out here that the vast majority of our jobs are serial single  
processor jobs solving embarrassingly parallel problems.  If we start  
to get significant multi-CPU jobs we may have to re-think this strategy.

The only queue which has dedicated slots is hugemem, because its  
specifically for the Altixes (and none of the other queues can send  
jobs to the Altixes).  We don't dedicate any other machines to  
individual queues or purposes, because doing so would reduce the  
cluster's throughput unless it was *extremely* carefully managed.  My  
personal view is that it's only worth dedicating nodes to a  
particular task type if you can guarantee that there are enough of  
those tasks available to keep the specialised nodes continually busy;  
in which case you effectively have a second cluster dedicated to that  
task.

Tim