[Beowulf] SGE + policy
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Orion Poplawski orion at cora.nwra.comThu May 27 08:07:58 PDT 2004
- Previous message: [Beowulf] SGE + policy
- Next message: [Beowulf] SGE + policy
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Robert G. Brown wrote: > > Primary tasks: matlab and stata jobs, run either interactively/remote > or (more likely) in batch mode. Jobs include both "short" jobs that > might take 10-30 minutes run by e.g. 1-2nd year graduate students as > part of their coursework and "long" jobs that might take hours to days > run by more advanced students, postdocs, faculty. > > Constraint: matlab requires a license managed by a license manager. > There are a finite number of licenses (currently less than the number of > CPUs) spread out across the pool of CPUs. > > Concern: That long running jobs will get into the queue (probably SGE > managed queue) and starve the short running jobs for either licenses or > CPUs or both. Students won't be able to finish their homework in a > timely way because long running jobs de facto hog the resource once they > are given a license/CPU. > > I am NOT an SGE expert, although I've played with it a bit and read a > fair bit of the documention. SGE appears to run in FIFO mode, which of > course would lead to precisely the sort of resource starvation feared or > equal share mode. Equal share mode appears to solve a different > resource starvation problem -- that produced by a single user or group > saturating the queue with lots of jobs, little or big, so that others > submitting after they've loaded the queue have to wait days or weeks to > get on. However, it doesn't seem to have anything to do with job > >>>control<< according to a policy -- stopping a long running job so that > > a short running job can pass through. > > It seems like this would be a common problem in shared environments with > a highly mixed workload and lots of users (and indeed is the problem > addressed by e.g. the kernel scheduler in almost precisely the same > context on SMP or UP machines). Recognizing that the license management > problem will almost certainly be beyond the scope of any solution > without some hacking and human-level policy, are there any well known > solutions to this well known problem? Can SGE actually automagically > control jobs (stopping and starting jobs as a sort of coarse-grained > scheduler to permit high priority jobs to pass through long running low > priority jobs)? Is there a way to solve this with job classes or > wrapper scripts that is in common use? Your biggest problem (as you say) will be licenses. I believe the scheduler tries to evenly allocate the running jobs among the different submitters. So, if you start with 14 empty slots and one person submits 500 jobs, they get filled with those jobs. But if someone else then submits 20 jobs, that person will eventually be given the next 7 slots to run as jobs complete, and will stay split until the second user has no more jobs. You can also setup subordinate queues where the queue with priority will only accept jobs that will take less than a given time to run. When a short job gets submitted, a long job on the subordinate queue will be stopped (SIGSTOP) while the short jobs runs. Your problem here is that the long job will presumably still hold a license. If matlab has some kind of checkpointing function, you could tie that into SGE to release the license. You can also limit the number of available slots for long running jobs. SGE can tie into license management through resource monitors to determine the number of licenses available. -- Orion Poplawski System Administrator 303-415-9701 x222 Colorado Research Associates/NWRA FAX: 303-415-9702 3380 Mitchell Lane, Boulder CO 80301 http://www.co-ra.com
- Previous message: [Beowulf] SGE + policy
- Next message: [Beowulf] SGE + policy
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
