To crash or not to crash
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eray Ozkural erayo at cs.bilkent.edu.trThu May 9 20:12:57 PDT 2002
- Previous message: Because XFS is BETTER (Re: opinion on XFS)
- Next message: Because XFS is BETTER (Re: opinion on XFS)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Friday 10 May 2002 00:58, W Bauske wrote: > Eray Ozkural wrote: > > It's very easy to crash a node with a suitable code, so I shouldn't have > > to re-install it or manually fsck it every time it fails to reboot after > > such a crash... > > How do you "easily" crash a node. Are you exceeding some resource > limit or?? > > I run quite large problems and don't see problems. Perhaps you mean > performance grinds to a halt because of paging or something like that > which makes the node un-responsive so you power cycle it. > Well. :) I think it depends on the application, but it's a sure thing that I can't provide you with some minimal code that's going to freeze any system for good. It does happen from time to time, though, more so on certain kernel version / hardware combinations. It's hard to say when and how those things happen but exhausting system resources is a good way to disrupt normal operation as you say. But by crash I mean crash, not temporary inflation of the working set. Let me try to give an example to what happens. I sometimes run a large program, ie one that uses lots of CPU/disk/network, and a node simply goes down. I'm sure almost everybody has had that kind of thing, for instance some GL programs used to crash Xfree86 and the whole system rather easily. The system would lock or reboot right away... If you write algorithms that use a lot of system resources or do unusual things, you may have done it with your own user-space code, too. I have never used a system that cannot be crashed :) If you've used such a system feel free to advertise it, but linux is certainly not like that :) (Maybe the *BSD people would want to praise their systems right now :) ) After all, these kinds of things are to be expected because *nobody* can give a formal proof that the system cannot crash, if you know what I mean. Unless, of course, the whole system was built upon such an invariant, which is not the case. I'm hoping that this gives a little justification to why you would want a filesystem that will not lose precious files/dirs on an unexpected crash; well all crashes are unexpected.... Now if the computer that crashes is your home PC, and you are the only user, it may be possible to predict what might crash your system. Like when you're testing your uber-kernel-module or superb-ai-algorithm. The problem is even then you can't guarantee that it won't crash. My claim here is that you can crash a system with an appropriate user-space code. On a cluster the probability that one of the nodes might crash is high. Of course, I would like to have a system that is wholly immune from crashes but I think it is a little naive to claim that linux cannot crash. The uptime of some linux boxen does not show that linux is incapable of crashing, it's simply that the whole system there was at a stable region. Try changing the system components frequently, and you will get a crash. [*] Now I won't ever say "crash" again. :) Some people here might want to hear me say that "linux cannot crash, and ext2 is the best filesystem ever written" but I won't say it even if Linus Torvalds and gang join this thread :) I doubt they would say such an over-confident statement :) And I still think that ext3 is not the only filesystem that is better than ext2. You could surely say that linux is more stable than, say, any version of windows which I would wholeheartedly agree with. Cheers, [*] Or maybe it might be said that I haven't configured my systems good enough, true, but what's the point of an OS if I have to configure it to prevent it from crashing? :) -- Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
- Previous message: Because XFS is BETTER (Re: opinion on XFS)
- Next message: Because XFS is BETTER (Re: opinion on XFS)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
