[Beowulf] Suggestions for scheduling software
Reuti
reuti at staff.uni-marburg.de
Wed Aug 20 03:18:08 PDT 2008
Hi Stephen,
Am 19.08.2008 um 11:20 schrieb stephen mulcahy:
> Up to now we've been working with a 20 node cluster where we'd have
> the luxury of working without any scheduling or queuing software -
> the cluster is pretty much dedicated to running a single job and is
> manually invoked with mpirun.
>
> We're moving to a much larger cluster in the near future and are
> keen to keep the utilisation as high as possible. On the new
> cluster we have to to run 2 distinct jobs - one is a long-running
> (weeks or possibly months) job and the other is a regular short
> running job (running in a few hours) which has to run at a specific
> time each day.
>
> We're currently looking at using SLURM for queuing up jobs on the
> system but I'm not sure if it will meet all of our needs here.
> Ideally, we'd have some system that would allow us to queue up the
> long-running job and a series of short-running jobs and the system
> would automatically suspend the long-running job when the short-
> running job is due to start, run the short-run job and then restart
> the long-running job.
>
> I expect we're not the only ones in this situation. Is SLURM the
> right tool for this job? If not, can anyone recommend other tools
> out there, preferably open source?
normally I would refuse to answer as I'm biased, but as there are no
replies at all: I would suggest to look into SGE: http://
gridengine.sunsource.net/ The requested automatic suspend feature is
supported by implementing a subordinated queue for long running jobs
in the long-queue. To start the short running jobs every day at a
fixed time, you could use a calender for the short-queue, which will
be enabled for a few hours every day and then drain again while the
jobs finish.
-- Reuti
More information about the Beowulf
mailing list