[Beowulf] Interactive vs batch, and schedulers
David Mathog
mathog at caltech.edu
Fri Jan 17 09:52:47 PST 2020
On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote:
> What I’m interested in is the idea of jobs that, if spread across many
> nodes (dozens) can complete in seconds (<1 minute) providing
> essentially “interactive” access, in the context of large jobs taking
> days to complete. It’s not clear to me that the current schedulers
> can actually do this – rather, they allocate M of N nodes to a
> particular job pulled out of a series of queues, and that job “owns”
> the nodes until it completes. Smaller jobs get run on (M-1) of the N
> nodes, and presumably complete faster, so it works down through the
> queue quicker, but ultimately, if you have a job that would take, say,
> 10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.
Generalizations are prone to failure but here we go anyway...
If there is enough capacity and enough demand for both classes of jobs
one could set up queues for the specific types, to keep the big ones and
the small ones apart, with pretty much constant utilization.
In some instances it may be possible to define the benefit (in some
unit, let's say dollars) for obtaining a given job's results and also
define the costs (in the same units) for node/hours, wait time, and
other resources. Using that function it might be possible to schedule
the job mix to maximize "value", at least approximately. Based solely
on times and nodes, without some measure of benefit and costs it might
be possible to optimize node utilization (by some measure), but spinning
the CPUs isn't really the point of the resource, right? I expect that
whatever job mix maximizes value will also maximize optimization, but
not necessarily the other way around. I bet that AWS's scheduler uses
some sort of value calculation like that.
A somewhat related problem occurs when there are slow jobs which use a
lot of memory but cannot benefit from all the CPUs on a node. (Ie, they
scale poorly.) Better utilization is possible if CPU efficient/low
memory jobs can be run at the same time on those nodes if there are then
"spare" CPUs. If done just right this is win win, with both jobs
running at close to their optimal speeds. This is tricky though because
if the total memory usage cannot be calculated ahead of time to be sure
there is enough the two jobs can end up fighting over that resource with
run times going way way up when page faulting occurs or jobs crashing
when the system runs out of memory.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list