<br><br>
<div class="gmail_quote">On 27 November 2013 15:49, Peter Clapham <span dir="ltr"><<a href="mailto:pc7@sanger.ac.uk" target="_blank">pc7@sanger.ac.uk</a>></span> wrote:<br>
<blockquote style="BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PADDING-LEFT:1ex" class="gmail_quote">
<div text="#000000" bgcolor="#FFFFFF">The enforcement of the memory limit has to date either been via wrapping jobs on startup by the scheduler with ulimit or via a local daemon sending a kill command when it notices that the job or job component exceeded the initial set limits.<br>
<br>Both the above approaches have limitations which can confuse users. The CGROUP approach seems to effectively take on the roll of ulimits on steroids and allows for accurate memory tracking and enforcement. This ensures that the job output includes the actual memory usage when killed as well as ensuring that the job cannot break the set limits.<br>
<br></div></blockquote>
<div>I totally agree with what you say.</div>
<div>If you run jobs with cpusets, then you don't need to depend on limits or the batch scheduler getting round and killing out-of-limits jobs </div>
<div>- if the job gets 'too big for its boots' the OOM killer deals with it.</div>
<div>Also collapsing a cpuset at the end of a job means that you don't have stray processes being left running with badly behaved codes.</div>
<div>(Yes, I know you can and should run scripts after a job finishes to deal with these - and indeed I do).</div></div>