Reuti reuti at staff.uni-marburg.de
Wed Aug 20 03:18:08 PDT 2008

Hi Stephen,

Am 19.08.2008 um 11:20 schrieb stephen mulcahy:

> Up to now we've been working with a 20 node cluster where we'd have  
> the luxury of working without any scheduling or queuing software -  
> the cluster is pretty much dedicated to running a single job and is  
> manually invoked with mpirun.
> We're moving to a much larger cluster in the near future and are  
> keen to keep the utilisation as high as possible. On the new  
> cluster we have to to run 2 distinct jobs - one is a long-running  
> (weeks or possibly months) job and the other is a regular short  
> running job (running in a few hours) which has to run at a specific  
> time each day.
> We're currently looking at using SLURM for queuing up jobs on the  
> system but I'm not sure if it will meet all of our needs here.  
> Ideally, we'd have some system that would allow us to queue up the  
> long-running job and a series of short-running jobs and the system  
> would automatically suspend the long-running job when the short- 
> running job is due to start, run the short-run job and then restart  
> the long-running job.
> I expect we're not the only ones in this situation. Is SLURM the  
> right tool for this job? If not, can anyone recommend other tools  
> out there, preferably open source?

normally I would refuse to answer as I'm biased, but as there are no  
replies at all: I would suggest to look into SGE: http:// 
gridengine.sunsource.net/ The requested automatic suspend feature is  
supported by implementing a subordinated queue for long running jobs  
in the long-queue. To start the short running jobs every day at a  
fixed time, you could use a calender for the short-queue, which will  
be enabled for a few hours every day and then drain again while the  
jobs finish.

-- Reuti

