[Beowulf] scheduler policy design

Thu Apr 19 08:05:14 PDT 2007

On 19 Apr 2007, at 3:20 pm, Toon Knapen wrote:

> Tim Cutts wrote:
>> Optimising for throughput, at least with an embarrassingly  
>> parallel workload of serial jobs like we have here, is trivial; a  
>> single first-come-first-served queue is optimal, as long as the  
>> code is well written, and doesn't block too much on shared  
>> resources like file servers or databases.
>
>
> but what if you have a bi-cpu bi-core machine to which you assign 4  
> slots. Now one slot is being used by a process which performs heavy  
> IO. Suppose another process is launched that performs heavy IO. In  
> that case the latter process should wait until the first one is  
> done to avoid slowing down the efficiency of the system. Generally  
> however, clusters take only time and memory requirements into account.

I think that varies.  LSF records the current I/O of a node as one of  
its load indices, so you can request a node which is doing less than  
a certain amount of I/O.  I imagine the same is true of SGE, but I  
wouldn't know.

> Additionally, in the case above, for optimising the efficiency of  
> the node, I might prefer to launch just 1 process which uses 4  
> threads to perform multi-threaded (BLAS) calculations.

That could certainly be requested with LSF:

bsub -n 4 -R"select[io < 10] span[hosts=1]" my_four_thread_job

selects a host currently performing less than 10 KB per second, and  
requests four job slots on a single node.

Tim