[Beowulf] job scheduler and accounting question

Tue Jul 14 12:06:18 PDT 2015

Hi folks:

   Its been a few years since we've had a good use case for a job 
scheduler, and I'll freely admit I've not paid nearly enough attention 
to what is currently out there.

   We are investigating options for a cluster/cloud scenario where I 
need to keep track of CPU, memory, disk used during the runs.  This 
"keeping track" should be available via command line tools (preferably 
in JSON/XML/CSV output that I can easily parse).

   The last time we did anything in this space, I used Torque and wrote 
my own account summary tool: 
https://scalability.org/2011/03/quick-accounting-tool-for-torque/ , and 
prior to that, I did something for SGE 
https://arc.liv.ac.uk/pipermail/gridengine-users/2006-October/011846.html

   Main requirements on the scheduler are

a) a shell access.  We need to be able to quickly launch a shell and 
limit CPU/memory usage.  Cgroup control/monitoring would be terrific.

b) the aforementioned accounting/usage bits.  Happy to write my own data 
extractor (likely will need to for this project anyway) as long as I can 
get the data via CLI/API/...

Ones I think I should be looking at include:

1) SLURM
2) OpenLava
3) Torque

What else?  Has the gridengine mess ever been sorted out?  And on a 
related note, are there any updated pages listing pro's/con's of the 
modern implementations of these?  Again, I've not paid attention to 
schedulers for a while, so things may have changed a bit in a few years ...

Thx!

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615