[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Chris Samuel chris at csamuel.org
Sun Jun 10 01:53:24 PDT 2018


On Sunday, 10 June 2018 1:22:07 AM AEST Scott Atchley wrote:

> Hi Chris,

Hey Scott,

> We have looked at this _a_ _lot_ on Titan:
>
> A Multi-faceted Approach to Job Placement for Improved Performance on
> Extreme-Scale Systems
> 
> https://ieeexplore.ieee.org/document/7877165/

Thanks! IEEE has it paywalled but it turns out ACM members can read it here:

https://dl.acm.org/citation.cfm?id=3015021

> This issue we have is small jobs "inside" large jobs interfering with the
> larger jobs. The item that is easy to implement with our scheduler was
> "Dual-Ended Scheduling". We set a threshold of 16 nodes to demarcate small.
> Jobs using more than 16 nodes, schedule from the top/front of the list and
> smaller schedule from the bottom/back of the list.

I'm guessing for "list" you mean a list of nodes?  It's an interesting idea 
and possibly something that might be doable in Slurm with some patching, for 
us it might be more like allocate sub-node jobs from the start of the list (to 
hopefully fill up holes left by other small jobs) and full node jobs from the 
end of the list (where here list is a set of nodes of the same weight).

You've got me thinking... ;-)

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


More information about the Beowulf mailing list