memory leak
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Dec 19 07:11:58 PST 2002
- Previous message: memory leak
- Next message: bzImage/kernel comp prob
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 18 Dec 2002, Brian LaMere wrote: > The NFS server is running the proprietary OS from EMC named "dart" they use > on their Celeras (and possibly other things). It had a firmware-ish update > during November to the NAS code for fix a user mapping bug, but that's about > it. The Celera is a cabinet that does nothing other than nfs and cifs. > While I didn't cripple the whole cabinet, I did cripple a datamover inside > it (the primary datamover for the filesystems I was accessing). > > I just checked, and there have been no configuration changes on there in the > last couple months Just as a matter of extreme humorosity, you might try picking a box with enough disk to hold the file(s) you are serving, copy them over, export them, and redirect all your nodes to mount from it instead (which probably wouldn't take as long as it sounds -- it is pretty trivial to set up an NFS server and push a hacked /etc/fstab to the nodes). Run it that way for a day or eight and see if it matters (problem resurfaces). BTW, you've just revealed (if I understand you correctly) that you're using a proprietary OS, closed source, black box NFS server. Ordinarily, I'd say that this is a bad idea, for precisely the reasons you are now in trouble: it may be IMPOSSIBLE for you to positively determine where your problem lies, unless you are fortunate enough to find a trivial problem and fix it. If it is a very slow memory leak or other "deep bug", or even a disagreement/incompatibility between your server and the mounting clients, how could >>you<< ever tell? How could you fix it? How can you even convince the vendor/mfr that the bug exists and is their fault so THEY can fix it? The answer is pretty universally not without a lot of work and finger pointing on everybody's part. One thing I would NO LONGER suggest that you do is take the problem to the kernel list. They tend to be a tiny bit intolerant of bug reports involving proprietary interfaces (hardware, software, peripheral) because of the obvious difficulties in determining who owns the problem and where it has to be fixed. Sometimes they'll listen, but sometimes they just don't want to waste their time. Sigh. It is going to be very difficult to debug this if it isn't (your) hardware. With a black box, it will be very difficult to debug if it IS hardware -- inside the black box. With a black box, you'll never debug it if it is a bug in the black box software -- at best you'll be able to convince yourself that it isn't a problem with your nodes per se and find a workaround (e.g. build your own NFS server, which is cheap'n'easy enough, and give your BB server to the poor) or MAYBE convince the company that they own the problem and stimulate a fix. Open source vs closed source, hmmm....;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: memory leak
- Next message: bzImage/kernel comp prob
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
