[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Scott Atchley e.scott.atchley at gmail.com
Sun Jun 10 05:33:22 PDT 2018


On Sun, Jun 10, 2018 at 4:53 AM, Chris Samuel <chris at csamuel.org> wrote:

> On Sunday, 10 June 2018 1:22:07 AM AEST Scott Atchley wrote:
>
> > Hi Chris,
>
> Hey Scott,
>
> > We have looked at this _a_ _lot_ on Titan:
> >
> > A Multi-faceted Approach to Job Placement for Improved Performance on
> > Extreme-Scale Systems
> >
> > https://ieeexplore.ieee.org/document/7877165/
>
> Thanks! IEEE has it paywalled but it turns out ACM members can read it
> here:
>
> https://dl.acm.org/citation.cfm?id=3015021
>
> > This issue we have is small jobs "inside" large jobs interfering with the
> > larger jobs. The item that is easy to implement with our scheduler was
> > "Dual-Ended Scheduling". We set a threshold of 16 nodes to demarcate
> small.
> > Jobs using more than 16 nodes, schedule from the top/front of the list
> and
> > smaller schedule from the bottom/back of the list.
>
> I'm guessing for "list" you mean a list of nodes?


Yes. It may be specific to Cray/Moab.


>   It's an interesting idea
> and possibly something that might be doable in Slurm with some patching,
> for
> us it might be more like allocate sub-node jobs from the start of the list
> (to
> hopefully fill up holes left by other small jobs) and full node jobs from
> the
> end of the list (where here list is a set of nodes of the same weight).
>
> You've got me thinking... ;-)
>
> All the best!
> Chris
>

Good luck. If you want to discuss, please do not hesitate to ask. We have
another paper pending along the same lines.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180610/be53d4b9/attachment.html>


More information about the Beowulf mailing list