[Beowulf] confused about high values of "used" memory under "top" even without running jobs
Mark Hahn
hahn at mcmaster.ca
Wed Aug 12 12:07:57 PDT 2009
> I am a bit confused about the high "used" memory that top is showing on one
> of my machines? Is this "leaky" memory caused by codes that did not return
> all their memory? Can I identify who is hogging the memory? Any other ways
> to "release" this memory?
free memory is WASTED memory. linux tries hard to keep only a smallish,
limited amount of memory wasted. if you add up rss of all processes,
the difference between that and 'used' is normally dominated by kernel
page-cache. see /proc/sys/vm/drop_caches on how to force the kernel
to throw away FS-related caches.
also, I often do this:
awk '{print $3*$4,$0}' /proc/slabinfo|sort -rn|head
to get a quick snapshot of kinds of memory use.
> Linux is also supposed to start using as much memory as you give it? Just
> confused if this is something I need to worry about or not.
you should never worry about paging (swapping, thrashing) until you see
nontrivial swapin (NOT out) traffic. (ie, the 'si' column in "vmstat 1").
> Incidentally the way I discovered this was because users reported that their
> codes were running ~30% faster right after a machine reboot as opposed to
> after a few days running.
isn't this one of the anomalous nehalem machines we've been talking about?
if so, it's become clear that the kernel isn't managing the memory
numa-aware, so the problem is probably just poor numa-layout/balance
of allocations.
> that in a scheduler based environment (say PBS) the last job releases
> all its memory resources before the new one starts running?
you could drop_caches, but this would also hurt you sometimes.
More information about the Beowulf
mailing list