[Beowulf] disabling swap on cluster nodes?
Robert G. Brown
rgb at phy.duke.edu
Sat Feb 7 06:56:26 PST 2015
On Sat, 7 Feb 2015, Mahmood Sayed wrote:
> Prentice,
>
> We regularly configure our compute nodes without any swap partition.
> There have been no adverse effects on the systems' performance under
> load. We're running clusters with everything from RHEL5/RHEL6 and the
> FOS variants thereof to several LTS versions of Ubuntu. RAM per node
> ranges from 32GB to 1TB. Jobs have run for several weeks without issue.
In the old days systems would run more efficiently with at least a small
swap partition than not. It isn't too difficult to generate a small
ramdisk and turn it into swap. Basically, this gives the kernel a place
to juggle things like pointers and shared library images. If you run
"free" on a system sometime after it has been up for a while you can see
how much swap is being used even though the system hasn't had any need
to "really" swap. That can give you an idea of how large a ramdisk to
allocate in order to keep the kernel happy and smooth things like paging
and context switching and buffering. Or you can run e.g. vmstat 5 and
watch to count the swap activity while running a moderate load, or look
directly at /proc/memstat with a much larger time granularity to see
changes. At a guess, 100 MB is plenty and hardly affects the memory you
have left to allocate to actual tasks. A ramdisk also means that you
don't experience much latency/bw penalty even for these "efficiency"
tasks.
This may be less true now with systemd (I'm only running a single system
with systemd so far and haven't really figured it out). Older kernels
were designed around memory being a scarce resource and disk being a
(comparatively) plentiful one. I don't know how much they've been
redesigned to relax this assumption in a world where systems with less
than 4 GB of main memory are becoming scarce.
I will say that any sort of VM system is going to become very sad indeed
if you ever run out of, or ALMOST run out of, operational memory. Swap
is like the flexibility of your heart's aorta. Nominally the heart can
work perfectly fine by just pumping a steady flow into a fixed system of
plumbing, but in reality it pumps in bursts, and having something to
buffer the peak pressure keeps the system from breaking. The computer
nominally runs just a handful of tasks and smoothly switches between
them but in reality, there are often "random" demands for kernel
attention and poissonian bursts occur. Swap can give the kernel a place
to put down work transiently and efficiently during one of those bursts
even if it COULD just page back to/from disk.
rgb
> ~~~~~~~~~~~~~~~~~~~~~~~
> Mahmood A. Sayed
> Sr Systems Programmer
> Research Support
> Pratt School of Engineering
> Duke University
> ~~~~~~~~~~~~~~~~~~~~~~~
>
>> On Feb 6, 2015, at 5:35 PM, Prentice Bisbal <prentice.bisbal at rutgers.edu> wrote:
>>
>> Do any of you disable swap on your compute nodes?
>>
>> I brought this up in a presentation I gave last night on HPC system administration, but realized I never actually did this, or know of anyone who has. I would tweak the vm.overcommit_memory setting, but that's not the same as disabling swap altogether. I'd like to try doing this in the future, but I prefer to learn from someone else's mistakes first.
>>
>>
>> --
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list