[Beowulf] size of swap partition
Mikhail Kuzminsky
kus at free.net
Tue Jun 10 10:35:46 PDT 2008
In message from Mark Hahn <hahn at mcmaster.ca> (Tue, 10 Jun 2008
00:58:12 -0400 (EDT)):
...
>for instance, you can always avoid OOM with the vm.overcommit_memory=2
>sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap
>to get the desired limits.) in this mode, the kernel tracks how much
>VM
>it actually needs (worst-case, reflected in Committed_AS in
>/proc/meminfo)
>and compares that to a commit limit that reflects ram and swap.
>
>if you don't use overcommit_memory=2, you are basically borrowing VM
>space in hopes of not needing it. that can still be reasonable,
>considering
>how often processes have a lot of shared VM, and how many processes
>allocate but never touch lots of pages. but you have to ask yourself:
>would I like a system that was actually _using_ 16 GB of swap? if you
>have 16x disks, perhaps, but 16G will suck if you only have 1 disk.
>at least for overcommit_memory != 2, I don't see the point of
>configuring
>a lot of swap, since the only time you'd use it is if you were
>thrashing.
>sort of a "quality of life" argument.
>
>>> But what are the reccomendations of modern praxis ?
>
>it depends a lot on the size variance of your jobs, as well as their
>real/virtual ratio. the kernel only enforces RLIMIT_AS
>(vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did
>RLIMIT_RSS or not.
>
>if you use overcommit_memory=2, your desired max VM size determines
>the amount of swap. otherwise, go with something modest - memory size
>or so. but given that the smallest reasonable single disk these days
>is probably about 320GB, it's hard to justify being _too_ tight.
:-) The disks we use in nodes is SATA WD/10K RPM w/70 GB :-))
We didn't set overcommit_memory=2, but really use strongly restricted
scheduling police for SGE batch jobs using only few applications. We
have only batch jobs (no interactive), moreover - practically only
*long batch jobs*. As a result we have summary VM (requested per node)
equal (or lower) than RAM. There is practically zero swap activity.
The only exclusion are (seldom executed) small test jobs,
non-parallelized, mainly for check of input data. They use small RAM
amount. So it looks for me that I may set even lower than 1.5*RAM swap
size (I think RAM+4G = 20G will be enough).
In message from Walid <walid.shaari at gmail.com> (Tue, 10 Jun 2008
19:27:43 +0300):
>Hi,
>For an 8GB dual socket quad core node, choosing in the kick start
>file --recommended instead of specifying size RHEL5 allocates 1GB of
>memory. our developers say that they should not swap as this will
>cause an overhead, and they try to avoid it as much as possible
OpenSuSE 10.3 recommends swap size=2 GB only, but I don't know,
performs SuSE inst software some estimation of server RAM or no.
Yours
Mikhail
More information about the Beowulf
mailing list