[Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?
Prentice Bisbal
prentice.bisbal at rutgers.edu
Mon Feb 9 10:56:01 PST 2015
On 02/09/2015 03:43 AM, Remy Dernat wrote:
>
> Le 09/02/2015 03:56, Christopher Samuel a écrit :
>> On 07/02/15 14:57, Alan Louis Scheinine wrote:
>>
>>> Only problem I've seen is that if a user allocates too much memory,
>>> OOM killer can kill maintenance processes such as a scheduler daemon.
>> This is why we disable overcommit. :-)
>>
> Hi,
>
> I already saw that problem on our master. The scheduler, SGE, runs out
> of memory and OOM decided to kill it:
>
> Dec 1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963
> (sge_qmaster) score 948 or sacrifice child
>
> I resolved that issue by disabling "schedd_job_info" in SGE with
> "qconf -msconf".
>
> However, this setting gives significant informations about our jobs.
>
> How should I adjust OOM killer ? Sould I set
> |vm.overcomm!
> it_memory
> = 2
> |
> ?
>
>
To be clear setting vm.overcommit_memory doesn't directly affect the
behavior of the OOM killer. Turning off overcommit prevents the Linux
virtual memory system from making promises it can't always keep, which
reduces/eliminates the need for the OOM Killer.
Setting vm.overcommit_memory = 2 turns off overcommitting and is the
best choice if you want to avoid the OOM Killer.
--
Prentice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150209/0d8bd130/attachment.html>
More information about the Beowulf
mailing list