[Beowulf] Do these SGE features exist in Torque?

Reuti reuti at staff.uni-marburg.de
Mon May 12 10:11:40 PDT 2008

Am 12.05.2008 um 18:01 schrieb Craig Tierney:

> Reuti wrote:
>> Hiho,
>> Am 12.05.2008 um 15:14 schrieb Prentice Bisbal:
>>>>> It's still an RFE in SGE to get any arbitrary combination of  
>>>>> resources, e.g. you need for one job 1 host with big I/O, 2  
>>>>> with huge memory and 3 "standard" type of nodes you could  
>>>>> request in Torque:
>> -l nodes=1:big_io+2:mem+3:standard
>> (Although this syntax has its own pitfalls: -l nodes=4:ppn=1 might  
>> still allocate 2 or more slots on a node AFAIO in my tests.)
> You mean the syntax has its pitfalls in Torque,

How Torque implement it for now: With ppn=1 I want one core per node,  
but might end up with any other allocation. AFACS they don't have an  
allocation rule like SGE where you can put a fixed 1, 2, $round_robin  
et al. there.

> or how SGE may impelement
> it?

If it would be in SGE, it would be a point of discussion how to  
interpret this expression.

>   I personally like the way SGE allocates nodes.  I can control how
> they get nodes.  When a user asks for 16 processors (core, slots,  
> whatever)
> they should get N nodes that have M processors, and N*M=16.  If a user
> needs to specify ppn=2 (or 4 or 8) it means they will mess it up  
> causing
> jobs to share nodes and adversely impact each other which I don't  
> want.

As ppn=2 will allocate 2 slots on a node (and avoid further usage),  
it shouldn't interfere with other users. I never saw a problem with it.

But with an old Linda version we had the problem (<= Gaussian 03 C. 
02), that you had to specify the nodes and for all nodes you need in  
addition to specify how many slots to use: 1 or 2 or 4 slots - and it  
must be the same on all nodes. If a node was double in the list,  
Linda complaint. So the only option was to specify full nodes: "-l  
nodes=2:ppn=4" to get the complete node and sometimes to wait,  
although an allocation 2+2+2+2 was possible and free in the cluster.  
But requesting "-l nodes=4:ppn=2" could end up with an allocation 4+2+2.

-- Reuti

