[Beowulf] Do these SGE features exist in Torque?
Reuti
reuti at staff.uni-marburg.de
Mon May 12 08:19:58 PDT 2008
Hiho,
Am 12.05.2008 um 15:14 schrieb Prentice Bisbal:
>>> At a previous job, I installed SGE for our cluster. At my current
>>> job
>>> Torque is the queuing system of choice. I'm very familar with
>>> SGE, but
>>> only have a cursory knowledge of Torque (installed it for
>>> evaluation,
>>> and that's it). We're about to purchase a new cluster. I'd have
>>> to make
>>> a good argument for using SGE over Torque. I was wondering if the
>>> following SGE features exist in Torque:
>>>
>>> 1. Interactive shells managed by queuing system
>>> 2. Counting licenses in use (done using a contributed shell
>>> script in
>>> SGE)
>>> 3. Separation of roles between submit hosts, execution hosts, and
>>> administration hosts
>>> 4. Certificate-based security.
>>>
>>> Are there any notable features available in Torque that aren't
>>> available
>>> in SGE?
>>
>> what you can find in Torque but not in SGE: request a mixture of
>> nodes,
>> i.e. one heavy node with much memory (or big I/O options) and 5 nodes
>> with less memory or less disk performance for a parallel job.
>
> Huh? Can you elaborate? My initial thought is "why would you need
> this?", but I think I see where you're going with this...
For some types of applications only the "master" of a parallel job is
collecting all the information and accessing the disk to store the
results. Therefore this, and only this, machine needs better I/O
capabilities than the slave nodes.
It's still an RFE in SGE to get any arbitrary combination of
resources, e.g. you need for one job 1 host with big I/O, 2 with huge
memory and 3 "standard" type of nodes you could request in Torque:
-l nodes=1:big_io+2:mem+3:standard
(Although this syntax has its own pitfalls: -l nodes=4:ppn=1 might
still allocate 2 or more slots on a node AFAIO in my tests.)
As Craig pointed out correctly, it can be set up in SGE, but there
might be combinatins where this gets convoluted. Thankfully I never
needed such a type of allocation of nodes, but I just wanted to point
out that this feature exists in Torque. If you don't need it for your
type of jobs, ignore it ;-)
BTW @Craig: It should be possible to request -masterq
qbigmem*@bigmem* in 6.0 which would shorten the line.
>>
>> OTOH, if you have parallel jobs:
>> http://www.beowulf.org/archive/2007-September/019269.html
>
> Thanks for the link. From my understanding of SGE, you can get tight
> integration with just about any MPI implementation. Is that true?
At least: all that I'm aware of. Nowadays I would go for Open MPI,
which should work out-of-the-box with both queuing systems. If you
need Linda or HP-MPI, seems SGE is the only option for a Tight
Integration (between these two - I'm not aware of the features of LSF
and others).
>> What is different between them from the idea: in Torque you submit
>> a job
>> into a queue, while in SGE you request resources and SGE will
>> select an
>> appropriate queue for you.
>
> You'll have to elaborate on this, too. From my knowledge of SGE,
> you had
> to specify the correct queue, too, or it went into the default queue.
There is nothing like a default queue in SGE (even the all.q, defined
at installation time, is just a queue without any special features -
you can edit or remove it, if you don't like to have it in your
cluster).
If you have some queues with different limits: i.e. e.g. 1h wallclock
vs. unlimited wallclock and two queues for 8 GB vs. 16 GB, a job
requesting 2 hrs of wallclock time will for sure end up in the queue
with unlimited time constraints, while a job requesting 30 minutes
might end up in either of the two queues if a slot is free. Same
stands for memory requests: a job requesting 4 GB might end up in any
of the two memory limited queues, while a 12 GB request can only be
run in the 16 GB queue. You don't have to specify it, as SGE will
select the correct one which fulfills your requested resources.
--- Reuti
> In SGE you can specify resources such as mem >= 32 GB for a node, or
> arch=AMD64. You can't do this with Torque? Seems like a very basic
> queuing system feature.
More information about the Beowulf
mailing list