robl at mcs.anl.gov
Tue Jan 29 07:39:33 PST 2002
On Mon, Jan 21, 2002 at 06:13:44PM -0800, Martin Siegert wrote:
> This is somewhat off topic - sorry for that.
it's a great topic for clusters. in an ideal world, the kernel never
oopses, but when you have N kernels and possibly dodgy hardware, it
i get frustrated with this list because topics like Martin's get
ignored, while topics like cooling with LN2, game console clusters
and anything athlon get multi-day discussions.
[snip problem report ]
> The first thing I would like to do is to log the oops message. Right now
> it goes to the console only - it does not appear in the log files
> although syslog sends everything of severity *.info to /var/log/messages.
i guess you've read Documentation/oops-tracing.txt , but if not, it's
a good start.
depending on where the panic happens, the part of the kernel that
would normally write that oops out to disk doesn't run.
So you've got a few options:
. typing off the screen: sucks. a lot. and is highly error prone.
and the kernel console blanking mechanism might kick in ( and since
the kernel has paniced, it won't listed for input signals and unblank
itself ) but if you've got no other option...
( one time a guy took a picture of the oops with a digital camera and
sent that to me. that was fun. I don't have any character regognition
software, but if someone knows of a linux OCR tool that won't mind a
screenful of hex, i'd like to hear about it )
. serial console: not bad. if it's just one machine, you can pass
parameters to your kernel and capture all kernel messages over the
serial port. Documentation/serial-console.txt has all the info you
. netconsole: http://people.redhat.com/mingo/netconsole-patches/
like a serial console, but using your network device instead of a
serial device. It's a kernel patch and a convienece script for the
sender and a userspace tool for the reciever to display the messages.
Patching a kernel and setting up yet another tool might be a bit much,
but man is it cool to see it work :>
. patch your kernel to support "dump log to swapfile" or "dump log to
disk". I haven't set something like this up, but always meant to
try it out...
Basically the name of the game is to get that oops into a form you can
feed to ksymoops, then hope the backtrace it prints out gives you a
clue. ( like "oh, the last thing it called was do_scsi_service... maybe
i have a dogdy scisi controller ).
Anybody else know of good ways ( even funny bad ways might be
entertaining) to capture an oops?
A215 0178 EA2D B059 8CDF
B29D F333 664A 4280 315B
More information about the Beowulf