kernel oopses
Bill Hilf
bill at hilfworks.com
Tue Jan 29 08:44:46 PST 2002
Robert Latham wrote:
>
> On Mon, Jan 21, 2002 at 06:13:44PM -0800, Martin Siegert wrote:
> > This is somewhat off topic - sorry for that.
>
> it's a great topic for clusters. in an ideal world, the kernel never
> oopses, but when you have N kernels and possibly dodgy hardware, it
> happens.
>
> i get frustrated with this list because topics like Martin's get
> ignored, while topics like cooling with LN2, game console clusters
> and anything athlon get multi-day discussions.
>
> [snip problem report ]
>
> > The first thing I would like to do is to log the oops message. Right now
> > it goes to the console only - it does not appear in the log files
> > although syslog sends everything of severity *.info to /var/log/messages.
>
> i guess you've read Documentation/oops-tracing.txt , but if not, it's
> a good start.
>
> depending on where the panic happens, the part of the kernel that
> would normally write that oops out to disk doesn't run.
>
> So you've got a few options:
>
> . typing off the screen: sucks. a lot. and is highly error prone.
> and the kernel console blanking mechanism might kick in ( and since
> the kernel has paniced, it won't listed for input signals and unblank
> itself ) but if you've got no other option...
>
> ( one time a guy took a picture of the oops with a digital camera and
> sent that to me. that was fun. I don't have any character regognition
> software, but if someone knows of a linux OCR tool that won't mind a
> screenful of hex, i'd like to hear about it )
>
> . serial console: not bad. if it's just one machine, you can pass
> parameters to your kernel and capture all kernel messages over the
> serial port. Documentation/serial-console.txt has all the info you
> need.
>
> . netconsole: http://people.redhat.com/mingo/netconsole-patches/
> like a serial console, but using your network device instead of a
> serial device. It's a kernel patch and a convienece script for the
> sender and a userspace tool for the reciever to display the messages.
> Patching a kernel and setting up yet another tool might be a bit much,
> but man is it cool to see it work :>
>
> . patch your kernel to support "dump log to swapfile" or "dump log to
> disk". I haven't set something like this up, but always meant to
> try it out...
To expand on this, the Linux Kernel Crash Dump package:
http://lkcd.sourceforge.net/
and Dprobes (from IBMs Linux Technology Center):
http://oss.software.ibm.com/developer/opensource/linux/projects/dprobes/
... which can also be used with Opersys's Linux trace toolkit
(http://www.opersys.com/LTT/).
And for the truly brave, use gdb ;)
-Bill
> Basically the name of the game is to get that oops into a form you can
> feed to ksymoops, then hope the backtrace it prints out gives you a
> clue. ( like "oh, the last thing it called was do_scsi_service... maybe
> i have a dogdy scisi controller ).
>
> Anybody else know of good ways ( even funny bad ways might be
> entertaining) to capture an oops?
>
> ==rob
>
> --
> Rob Latham
> A215 0178 EA2D B059 8CDF
> B29D F333 664A 4280 315B
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
-Bill
<bill at hilfworks.com>
PGP Fingerprint: 4CE0 D72C C7A2 89B2 6B23 03DC B5E9 77CB E6F3 0D2A
http://pgpkeys.mit.edu:11371/pks/lookup?op=get&exact=on&search=0xE6F30D2A
More information about the Beowulf
mailing list