[Beowulf] slow jobs when run through queue

Nick Evans nick.c.evans at gmail.com
Tue Dec 5 21:47:42 PST 2017


Thanks Brian / Carl / Chris for places to look.... it turned out to be what
Chris had mentioned and they were only requesting 1 CPU but trying to use
all 48 in the machine.

Resubmitted the request asking for all CPU's and the job ran in the
expected amount of time.

Thanks again
Nick

On 6 December 2017 at 12:58, Chris Samuel <chris at csamuel.org> wrote:

> On 6/12/17 11:44 am, Nick Evans wrote:
>
> We have found that if we submit a job to the queue then it takes a long
>> time to process. ie. >4 hours
>> If we are to run the exact same processing directly on the compute node
>> then it is significantly faster < 1 hour.
>>
>
> Some quick ideas
>
> Are you comparing a job that has asked for all cores and all RAM with
> it running directly on the node?
>
> Try using "perf top" to get an idea of what's going on with the node
> when doing the comparison runs, perhaps "perf record" too but I can
> never remember if an unprivilged user can do that.  That might shed
> some light.
>
> To me it sounds like it might be something that that checks how
> many cores a node has naively and then starts that many threads/
> processes and if the batch job only asks for a single core, or
> less than all, then you might end up with a lot of contention.
>
> Good luck!
> Chris
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20171206/b02992b5/attachment.html>


More information about the Beowulf mailing list