Linux memory leak?

Josip Loncaric josip at icase.edu
Thu Feb 28 11:58:07 PST 2002


On our heterogeneous cluster, we run Red Hat 7.2 updated to stock i686
Linux kernels 2.4.9-21 or 2.4.9-21smp.  Sometimes (e.g. after 14 days of
normal operation) our nodes report unusually high memory usage even
without any user processes active.  This can happen on both single CPU
and on dual CPU machines, and it used to happen with previous 2.4
kernels.  Here is an example:

# free
             total       used       free     shared    buffers    
cached
Mem:        512444     449196      63248          0      70164     
76332
-/+ buffers/cache:     302700     209744
Swap:      1060272     285492     774780

If I add up all RSS numbers reported by 'ps -e v' I get only about
20,500 KB, and yet this dual CPU system reports 302,700 KB RAM used
(without even counting buffers or cache).  Apparently, only 'reboot' can
recover the missing 282,200 KB.  Any ideas on tracking down where the
missing memory went?

Sincerely,
Josip

P.S. Here is more detail:

# cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  524742656 460013568 64729088        0 71929856 369848320
Swap: 1085718528 292343808 793374720
MemTotal:       512444 kB
MemFree:         63212 kB
MemShared:           0 kB
Buffers:         70244 kB
Cached:          76332 kB
SwapCached:     284848 kB
Active:         242464 kB
Inact_dirty:    188960 kB
Inact_clean:         0 kB
Inact_target:   131068 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       512444 kB
LowFree:         63212 kB
SwapTotal:     1060272 kB
SwapFree:       774780 kB

# ps -e v 
  PID TTY      STAT   TIME  MAJFL   TRS   DRS  RSS %MEM COMMAND
    1 ?        S      0:05    139    23  1392  480  0.0 init
    2 ?        SW     0:00      0     0     0    0  0.0 [keventd]
    3 ?        SWN    0:01      0     0     0    0  0.0 [ksoftirqd_CPU0]
    4 ?        SWN    0:01      0     0     0    0  0.0 [ksoftirqd_CPU1]
    5 ?        SW     0:08      0     0     0    0  0.0 [kswapd]
    6 ?        SW     0:00      0     0     0    0  0.0 [kreclaimd]
    7 ?        SW     0:00      0     0     0    0  0.0 [bdflush]
    8 ?        SW     0:00      0     0     0    0  0.0 [kupdated]
    9 ?        SW<    0:00      0     0     0    0  0.0 [mdrecoveryd]
   13 ?        SW     0:13      0     0     0    0  0.0 [kjournald]
   88 ?        SW     0:00      0     0     0    0  0.0 [khubd]
  154 ?        SW     0:01      0     0     0    0  0.0 [kjournald]
  428 ?        S      0:00     41    46  1485  504  0.0 /sbin/pump -i et
  453 ?        S      0:00     79    23  1452  644  0.1 syslogd -m 0
  458 ?        S      0:00     46    18  2077  508  0.0 klogd -2
  478 ?        S      0:00     83    25  1538  604  0.1 portmap
  506 ?        S      0:00    110    21  1590  616  0.1 rpc.statd
  631 ?        SL     0:03     24   234  1705 1936  0.3 ntpd -U ntp
  685 ?        S      0:00     20    12  1439  508  0.0 /usr/sbin/atd
  703 ?        S      0:00     32   232  2451  656  0.1 /usr/sbin/sshd
  736 ?        S      0:00    143   133  2138  820  0.1 xinetd -stayaliv
  795 ?        S      0:00     75    18  1573  624  0.1 crond
  843 tty1     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  844 tty2     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  845 tty3     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  846 tty4     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  847 tty5     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  848 tty6     S      0:00    109     6  1381  368  0.0 /sbin/mingetty t
  849 ?        S      2:25    162    10  1429  584  0.1 /opt/sbin/cnm -i 
  850 ?        S      1:39    243   484  1747  928  0.1 /bin/bash /opt/s
 1105 ?        SW     0:14      0     0     0    0  0.0 [rpciod]
 1106 ?        SW     0:00      0     0     0    0  0.0 [lockd]
11105 ?        S      0:51    125   149  1794 1072  0.2 /usr/PBS/sbin/pb
24146 ?        S      0:00      9   423  4804 1996  0.3 sendmail: accept
27052 ?        S      0:00      0   400    39  172  0.0 /sbin/dhcpcd -n 
27219 ?        S      0:00    289    12  2243 1064  0.2 in.rlogind
27220 pts/0    S      0:00    288    16  2339 1120  0.2 login --
root                             
27221 pts/0    S      0:00    288   484  2047 1360  0.2 -bash
27314 ?        S      0:00    168     9  1934  680  0.1 sleep 60
27315 pts/0    R      0:00    175    59  2588  716  0.1 ps -e v

# uptime
  2:53pm  up 14 days, 17:03,  1 user,  load average: 0.00, 0.00, 0.00

-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134



More information about the Beowulf mailing list