[Beowulf] Queue Systems
Reuti
reuti at staff.uni-marburg.de
Thu Sep 6 14:16:47 PDT 2007
Am 06.09.2007 um 18:27 schrieb Chris Dagdigian:
>
> { Declaration of bias; I run the http://gridengine.info site in my
> spare time ... }
>
> I'm quite familiar with both LSF and SGE, using both products in my
> professional work and helping clients with queue system selection,
> deployment, application integration and training. I'm less
> familiar with PBS/Torque/etc. having only run those in small
> virtualized lab environments. At the time when I was looking at
> open source solutions, none of the PBS variants supported array
> jobs so I went with SGE and never looked back.
Another thing is Tight Integration of parallel runs, which is
available in PBS/Torque for LAM-MPI and OpenMPI, but not for HP-MPI,
Linda or PVM. You can use it with these queuing systems of course,
but the slave processes are not controlled by them, nor will you get
a correct accounting. SGE offers an rsh replacement called qrsh which
will support these parallel environments.
-- Reuti
> The current state of the art is quite good. For 90% of use cases
> and end-user requirements you really can't go wrong with any of the
> available products.
>
> Everything out there (open source or commercial) is capable of
> doing the standard sort of "policy based resource management on
> distributed systems" that we all care about.
>
> So with all products capable of doing just about everything you
> would need, making an actual product selection comes down to areas
> other than the functionality of the queueing core.
>
> Things like:
>
> - Administrative burden (if keeping PBS from falling over requires
> a full time employee; the cost of LSF looks far more attractive for
> instance ...)
> - Cost
> - Quality of support
> - Quality of technical documentation
> - Quality of training / professional services
> - Layered products that enhance base functionality
>
> Platform LSF is the gold standard. Low administrative burden, great
> documentation/support and resiliency features that competitors
> still have a tough time matching and all wrapped up with additional
> (at extra cost of course) layered products that nobody else can
> really touch. The downside? Cost of course. In particular the
> current Linux pricing model punishes you for putting more than 4GB
> of RAM into a compute node or using a non X86/X86_64 architecture
> -- in both cases you'll get bounced out of the "cheap" license
> category and into a far more expensive one where the cost of the
> software license is in the same ballpark as the cost of the server
> hardware.
>
> Platform will happily sell you additional layered products that can
> do things like:
>
> - Tight integration with FlexLM license servers; more powerful than
> the standard load sensor (SGE) and elim (LSF) methods that people
> do "for free"
> - Seriously hardcore reporting and analytic tools suitable for the
> largest enterprises
> - Tight integration with parallel environments and high speed
> interconnects (plus support for these environments which is non-
> trivial)
> - SLA-aware scheduling
> - Multi-cluster aware scheduling
> - etc. etc.
>
> The base version of LSF also ships with a basic reporting module
> and a tomcat-driven web interface that is suitable for users
> (submit and monitor jobs) as well as admins (manage queues and
> hosts). SGE in particular does not really have anything like this
> except for ARCo on the reporting side and ARCo is no match for even
> the "free" reporting module you get with LSF 7.x
>
> That said though, it's been my experience that a vast majority of
> the "market" does not need and will not likely ever need some of
> the advanced/enterprise level add-ons that integrate so cleanly
> with the base Platform LSF products.
>
> So this drops me back down into my original argument that just
> about any of the available products will perform well at doing what
> you need. The key advice I have is to understand that everyone is
> pretty good at the basic functions so you'll have to make your
> selection decision based on some of the other criteria I tried to
> list above.
>
>
> My general rule of thumb for new projects is to start with the
> assumption that I'll be using Grid Engine. Then, after more formal
> understanding of the work-flows and customer requirements are
> achieved it may become clear that Platform LSF is a better choice.
>
> For all of 2007 I'd probably take a guess at saying that I've
> worked on 20+ Grid Engine systems and deployed LSF just once for a
> large enterprise customer.
>
>
> My $.02 of course!
>
> Regards,
> Chris (posting from my non-corporate address)
>
>
>
>
>
>
>
>
> On Sep 6, 2007, at 5:30 AM, andrew holway wrote:
>
>> Hi,
>>
>> We are trying to work out the differences between these queue
>> systems.
>>
>> Can anyone shed any light? Pros and Cons...
>>
>> SGE, Torque (with Maui), PBSPro and LSF
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list