[Beowulf] Do these SGE features exist in Torque?

Glen Beane glen.beane at jax.org
Tue May 13 04:25:09 PDT 2008


On May 13, 2008, at 6:17 AM, Bogdan Costescu wrote:

> On Mon, 12 May 2008, Glen Beane wrote:
>
>> I know TORQUE USED to be much better than SGE at controlling MPI  
>> type jobs.
>
> I think that it still is, due to the long-awaited but still not  
> existing TM support in SGE.
>
>> If you use a PBS/TORQUE aware MPI job launcher it is pretty much  
>> impossible for any of the job processes to escape control of the  
>> batch system.
>
> Hmm, not quite true. I've had just recently several such instances  
> where I had to kill individual processes by hand (using Torque  
> 2.1.10). One nice thing about SGE is its use of setgroups() to set  
> additional groups from a reserved range on the all the processes of  
> a job; as this call is normally only available to "root", it's  
> impossible for user processes to modify the additional groups list  
> and escape being killed; I used SGE in the past and don't remember  
> ever having to clean up processes by hand.
>
> [ Please note that I'm taking here into consideration only the  
> batch system proper and not any kind of prologue/epilogue scripts  
> which are the usual fixes that are applied locally. IMHO job  
> cleanup is a basic functionality that should be included in the  
> batch system proper. ]

In TORQUE I've never had a problem with TM spawned processes not  
getting cleaned up,  and there is no universal way for TORQUE to know  
about and clean up anything spawned outside of TM (such as with a ssh  
based MPI job launcher).  If TM-spawned processes were not getting  
cleaned up I would log it as a bug.




More information about the Beowulf mailing list