[Beowulf] scheduler policy design

Bill Bryce bill at platform.com
Tue Apr 24 06:13:47 PDT 2007

To solve the problem below that toon describes where the scheduler
believes 4 jobs can co-exist on a single node but they cannot because
they are I/O (disk) bound jobs and will thrash the system.....

There are several ways in LSF, here are two...  

1) create a new resource for the type of job call it 'widgets' and when
the job is submitted tell LSF that this type of job consumes 1 widget.
That will solve the problem LSF knowing the job is a 'big io job'.  Then
configure either the queue, hosts, users, or more complicated limits on
the resource widget - say for example you configure hosts so that this
particular host cannot have more than 1 widget job running.  With this
configuration LSF will know that it cannot run more than 1.

This is a simple solution - easy to understand but has
limitations....i.e. if the job is really only I/O bound for a period of
time then the machine is actually under utilized once the job 'gets

2) use the LSF resource reservation mechanism.  This is more complex but
essentially you can boil it down to the idea that you tell LSF to 'bump
up the resource usage' on a resource, making it look like more I/O is
consumed than really is consumed for a given period of time and apply a
decay function so that the 'artificial bump in I/O' decreases over
time.....  Now once you have configured this you submit the job to LSF
telling it to use the resource reservation and decay...then the job
starts and the scheduler 'believes that the job is taking lots of I/O'
even though it is not taking lots of I/O and does not start two of them
(since you configured the host to only start 1 when I/O is high)  then
as the artificial I/O load decays the real I/O load kicks in after 4

So the scheduler won't schedule two jobs when the I/O is high, however
if this type of job is 'high I/O at beginning but much less later' then
two jobs can start the second one will start after the I/O (reservation
+ real I/) falls below the threshold set in your queues or hosts for a
job dispatch.

Finally if you don't like those mechanisms you can create any type of
'resource or attribute' you want and apply them as a limit to LSF
scheduling...so if needed you can create more complex I/O resources and
use them for your scheduling.



-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Toon Knapen
Sent: Tuesday, April 24, 2007 8:31 AM
To: Tim Cutts
Cc: 'beowulf at beowulf.org'
Subject: Re: [Beowulf] scheduler policy design

Tim Cutts wrote:

>> but what if you have a bi-cpu bi-core machine to which you assign 4 
>> slots. Now one slot is being used by a process which performs heavy 
>> IO. Suppose another process is launched that performs heavy IO. In 
>> that case the latter process should wait until the first one is done 
>> to avoid slowing down the efficiency of the system. Generally
>> clusters take only time and memory requirements into account.
> I think that varies.  LSF records the current I/O of a node as one of 
> its load indices, so you can request a node which is doing less than a

> certain amount of I/O.  I imagine the same is true of SGE, but I 
> wouldn't know.

Indeed, using SGE you could also take this into account. However if 
someone submits 4 jobs, the jobs do not directly start to generate heavy

I/O. So the scheduler might think that the 4 jobs can easily coexist on 
this same node. However, after a few minutes all 4 jobs start eating 
disk BW and slow the node down horribly. What would your suggestion be 
to solve this ?



Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list