Linux memory leak?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduFri Mar 1 12:21:22 PST 2002
- Previous message: Motherboard query...
- Next message: Scyld: Beosetup, Beostatus, Beompi
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: > > > normal. Still, if 'free' cannot be always trusted, then system > > management decisions based on free memory can be flaky. > > free=wasted, to Linux. if you're worried about MM sanity, > you should look at swap *ins*... I'm primarily interested in RAM usage minus buffers minus cache, used by a batch scheduler to avoid paging. The 'free' problem can happen on our 384MB, 512Mb, 1GB and 2GB machines, but it is similar to the Red Hat bug report http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=59002 which talks about 2GB+ systems. Gerry Morong describes the problem this bug causes for the LSF batch scheduler: > I am experiencing a similar problem on Red Hat 7.2 with the 2.4.9-* kernels. > If I run jobs past core memory into swap, significant memory and swap are still > allocated when the jobs finish. Have tested many configuration (most 2 > processor) 1GB, 2GB, 3GB, 4GB RAM. All have the same problem. Example: 2P, > 4GB RAM, 8GB swap. Run 2 jobs both asking for 2.5GB ( total 5GB ). Memory and > swap both push 4GB each. When the jobs finish, both memory and swap are still > holding 2.5GB of space each. Eventually our compute farm managed by LSF will > not allocate jobs to each machine because free memory is almost non-existent. The key point above is the comment about LSF: bad system data leads to bad scheduler behavior. BTW, the "malloc() all memory then quit" procedure does not always fix the numbers reported by 'free'. On our largest node (2GB RAM plus 4GB swap) running the stock (Red Hat) kernel 2.4.9-21, the maximum space malloc() can get is 1919 MB, not even close to the 3 GB process address space limit even though 'ulimit' is unlimited. After this 1919 MB is reached, my test program quits (thereby releasing memory), but 'free' numbers remain unreasonable. Finally, our machines do not enter this "missing memory" state at random. It seems that some users' MPI-based parallel code(s) can force the machine into that state, while other codes run fine. This suggests that Linux kernel 2.4.9 allows a mere application to royally mess up its 'free' numbers. BTW, Red Hat just tweaked their stable kernels to 2.4.9-31, but not yet 2.4.17. Sincerely, Josip P.S. Here is a simple program to figure out how many MB can malloc() grant. BTW, malloc() error detection (via errno!=0 or via malloc()==NULL or via environment variable MALLOC_CHECK_=1) in Linux is not very reliable (the program often gets terminated before printing the final result). This is why it helps to print the number of MB allocated after each successfull malloc(). #include <stdio.h> #include <stdlib.h> #include <errno.h> extern int errno; #define MAXMB (4<<10) #define MB (1<<20) #define PG (4<<10) int main(argc,argv) int argc; char *argv[]; { char *m; int i,j; printf("PG = %d\n",PG); printf("MB = %d\n",MB); printf("MAXMB = %d\n\n",MAXMB); sleep(3); for(i=0;i<MAXMB;i++) { m=malloc(MB); printf("%d MB ...",i+1); if(errno || (m==NULL)) break; for(j=0;j<MB;j+=PG) m[j]='A'; printf(" OK\n"); } printf("\n\nAllocated %d MB\n",i); exit(0); } -- Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: Motherboard query...
- Next message: Scyld: Beosetup, Beostatus, Beompi
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
