[Beowulf] slow jobs when run through queue

Chris Samuel chris at csamuel.org
Tue Dec 5 17:58:13 PST 2017


On 6/12/17 11:44 am, Nick Evans wrote:

> We have found that if we submit a job to the queue then it takes a long 
> time to process. ie. >4 hours
> If we are to run the exact same processing directly on the compute node 
> then it is significantly faster < 1 hour.

Some quick ideas

Are you comparing a job that has asked for all cores and all RAM with
it running directly on the node?

Try using "perf top" to get an idea of what's going on with the node
when doing the comparison runs, perhaps "perf record" too but I can
never remember if an unprivilged user can do that.  That might shed
some light.

To me it sounds like it might be something that that checks how
many cores a node has naively and then starts that many threads/
processes and if the batch job only asks for a single core, or
less than all, then you might end up with a lot of contention.

Good luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


More information about the Beowulf mailing list