[Beowulf] [External] Anyone have docs/URLs/resources covering Grid Engine vs SLURM deltas or migration?

Fri Jan 29 22:02:34 UTC 2021

Chris,

I remember you from the various SGE mailing lists. Welcome to the Slurm 
club. You might want to join the Slurm mailing list, too, and post this 
over there, too. I'm sure there are plenty of Slurm Admins who used to 
be SGE admins. The Slurm documentation at slurm.schedmd.com is very 
good. This link may help for submission options and basic user commands:

https://slurm.schedmd.com/rosetta.pdf

but I don't think you'll find a simple diff. explanation about 
architecture and admin commands.

It's a shame the life sciences are so resistant to change. I know 
exactly what you're talking about. When I first started supporting life 
sciences, they were still insisting on SGI workstations when everyone 
else in the Scientific Computing world had switched to Linux, and a 
Linux workstation with a QuadroFX card could run circles around the SGI 
workstations for only 1/4 of the cost. And don't get me started on the 
cost/performance comparison between an 8-processor Origin server and a 
1U Linux server!

And yeah, the last cluster I supported using SGE was doing Genomics work.

Good luck on your transition.

Prentice

On 1/29/21 3:12 PM, Chris Dagdigian wrote:
> Hi folks,
>
> Those who know me from my day job and other email address know that 
> I've been a hardcore SGE person for a decade+ now.  Feels bad to even 
> write this email, heh
>
> Internally at my company we've been having a great internal discussion 
> about SGE and SLURM, the specific trigger being our widespread use of 
> AWS Parallelcluster for auto-scaling compute farms on AWS and the 
> decision by AWS to deprecate the open source SGE distributed with the 
> stack at some point in the future.
>
> Related to this is the longstanding use of SGE in the life sciences -- 
> there are genome sequencers and other wet lab instruments that ship 
> natively with SGE support from the vendor so there is going to be a 
> long tail of SGE still in use or targeted for use in biotech and 
> pharma spaces.
>
> I think there is still a future for commercial SGE especially after 
> the Univa -> Altair tie up but that still leaves the poor 
> orphaned/forked open source SGE distro's still kinda hanging out there 
> with no real updates or improvements in ages so I am understanding of 
> the AWS HPC folk desire to pare down their supported scheduler stack.
>
> I want to build up my own knowledge and prep for our own increased use 
> of SLURM on AWS.  I'm very comfortable with SGE architecture, 
> operational philosophy and capabilities but I lack similar info for 
> modern SLURM.   I'm ready and willing to start from scratch to build 
> my own transition and "differences between SGE/SLURM" documentation 
> but was wondering who out there has made this transition before and if 
> there are any public domain FAQs, wikis, technical writeups or other 
> guidance that I can learn from.
>
> If I can manage to put my own materials together and they look 
> sensible I will plan on publishing them openly. Thanks!
>
> So far the internal conversation we are having is centering on the 
> differences in resource based job scheduling when there are specific 
> needs to declare required resources up front like GPUs or memory 
> requirements.  Most of the differences beyond basic queue/partition 
> design seem to center around the minutiae of scheduling and placing 
> jobs but I'm sure I'm missing other larger areas.
>
>
> Regards
> Chris
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

-- 
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov