[Beowulf] first cluster

Christopher Samuel samuel at unimelb.edu.au
Thu Jul 15 22:31:25 PDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/07/10 14:30, Rahul Nabar wrote:

> Is it possible to know how much over-committed my OS was,
> say in the last one day. Or at least instantaneously.

I would suggest that you may not want to run your systems
overcommitted, I feel that it's much nicer for an application
to have a malloc() fail than for the OOM killer to get invoked.

On the topic of memory usage, the Linux kernel has been
(until fairly recently) rather bad at reporting that
reliably (or at least usefully).  There were some recent
patches that improved its memory accounting and there's
a tool called "smem" which gives an interesting way of
looking at things (packaged in Debian and Ubuntu):

http://www.selenic.com/smem/

Not sure if it'll work on RHEL 5 though, the kernel
is likely too ancient for it.

> Ah! this might explain why once in a while I have a node with sshd
> dead. Is it possible to tell the kernel that certain processes are
> "privileged" and when it seeks to find random processes to kill it
> should not select these "privileged" processes? Some candidates that
> come to my mind are sshd, nagios and pbs_mom

You're in luck, there was an LWN article last year which touched on
this:

http://lwn.net/Articles/317814/

# Users and system administrators have often asked for ways to
# control the behavior of the OOM killer. To facilitate control,
# the /proc/<pid>/oom_adj knob was introduced to save important
# processes in the system from being killed, and define an order
# of processes to be killed. The possible values of oom_adj
# range from -17 to +15. The higher the score, more likely the
# associated process is to be killed by OOM-killer. If oom_adj
# is set to -17, the process is not considered for OOM-killing.

cheers!
Chris
- -- 
 Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computational Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkw/7q0ACgkQO2KABBYQAh+5dwCdH7FvlO6Fv1XP0f58r1q+0cVC
YV4AniFwSLScUnqkgmE/crX+htauzx2P
=DnRX
-----END PGP SIGNATURE-----



More information about the Beowulf mailing list