[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?
Skylar Thompson
skylar.thompson at gmail.com
Sun Jun 10 03:26:15 PDT 2018
On Sun, Jun 10, 2018 at 06:46:04PM +1000, Chris Samuel wrote:
> On Sunday, 10 June 2018 1:48:18 AM AEST Skylar Thompson wrote:
>
> > We're a Grid Engine shop, and we have the execd/shepherds place each job in
> > its own cgroup with CPU and memory limits in place.
>
> Slurm has supports cgroups as well (and we use it extensively), the idea here
> is more to try and avoid/minimise unnecessary inter-node MPI traffic.
We have very little MPI, but if I had to solve this in GE, I would try to
fill up one node before sending jobs to another. The queue sort order
(defaults to instance load, but can be set to a simple sequence number) is
a general way, while the allocation rule for parallel environments
(defaults to round_robin, but can be set to fill_up) is another specific to
multi-slot jobs.
Not sure the specifics for Slurm, though.
--
Skylar
More information about the Beowulf
mailing list