[Beowulf] Docker in HPC
hearnsj at googlemail.com
Wed Nov 27 10:06:06 PST 2013
On 27 November 2013 15:49, Peter Clapham <pc7 at sanger.ac.uk> wrote:
> The enforcement of the memory limit has to date either been via wrapping
> jobs on startup by the scheduler with ulimit or via a local daemon sending
> a kill command when it notices that the job or job component exceeded the
> initial set limits.
> Both the above approaches have limitations which can confuse users. The
> CGROUP approach seems to effectively take on the roll of ulimits on
> steroids and allows for accurate memory tracking and enforcement. This
> ensures that the job output includes the actual memory usage when killed as
> well as ensuring that the job cannot break the set limits.
> I totally agree with what you say.
If you run jobs with cpusets, then you don't need to depend on limits or
the batch scheduler getting round and killing out-of-limits jobs
- if the job gets 'too big for its boots' the OOM killer deals with it.
Also collapsing a cpuset at the end of a job means that you don't have
stray processes being left running with badly behaved codes.
(Yes, I know you can and should run scripts after a job finishes to deal
with these - and indeed I do).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf