[Beowulf] Do these SGE features exist in Torque?

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Tue May 13 03:17:11 PDT 2008

On Mon, 12 May 2008, Glen Beane wrote:

> I know TORQUE USED to be much better than SGE at controlling MPI 
> type jobs.

I think that it still is, due to the long-awaited but still not 
existing TM support in SGE.

> If you use a PBS/TORQUE aware MPI job launcher it is pretty much 
> impossible for any of the job processes to escape control of the 
> batch system.

Hmm, not quite true. I've had just recently several such instances 
where I had to kill individual processes by hand (using Torque 
2.1.10). One nice thing about SGE is its use of setgroups() to set 
additional groups from a reserved range on the all the processes of a 
job; as this call is normally only available to "root", it's 
impossible for user processes to modify the additional groups list and 
escape being killed; I used SGE in the past and don't remember ever 
having to clean up processes by hand.

[ Please note that I'm taking here into consideration only the batch 
system proper and not any kind of prologue/epilogue scripts which are 
the usual fixes that are applied locally. IMHO job cleanup is a basic 
functionality that should be included in the batch system proper. ]

> Last time I used SGE, I found the MPI support much less 
> sophisticated than TORQUE, but this was several years ago.

This is easy to explain once you have to look at how they both 
started. However generally speaking I can see that during the past few 
years they started to grow similar features (f.e. SGE is getting 
better parallel jobs integration and possibly TM support, Torque is 
getting job-array support)

