[Beowulf] Logging MCE information on next warm boot?
Henning Fehrmann
henning.fehrmann at aei.mpg.de
Wed Jan 27 00:29:58 PST 2010
Hi David,
On Tue, Jan 26, 2010 at 10:46:40AM -0800, David Mathog wrote:
> > David Mathog wrote:
> > Will a logger
> > message for "kern" test it, or is there some other way to force a
> > printk? I'm afraid the logger method might look like it is working, but
> > just go through the usual syslog channels instead of netconsole.
>
> Too optimistic. With netconsole (supposedly) running on the local node
Correct.
>
> logger -p kern.err "test from me"
>
> only shows up in the log file on that node. No chance of confusion ;-).
> There is no explicit network logging of kern.err in /etc/syslog.conf,
> since I figured syslog is never going to be able to actually log
> anything after a kernel error.
>
> dmesg shows that netconsole started and thinks it is working:
>
> netconsole: local port 6666
> netconsole: local IP 192.168.1.20
> netconsole: interface eth0
> netconsole: remote port 514
> netconsole: remote IP 192.168.1.220
> netconsole: remote ethernet address 00:30:48:59:f8:ff
> console [netcon0] enabled
> netconsole: network logging started
>
> However, absolutely nothing comes over netconsole when a node reboots.
>
> Searched a lot and finally found out how to test netconsole:
>
> [root at monkey20 rc6.d]# echo 'p' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 't' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 'm' > /proc/sysrq-trigger
>
> and it generated these on the syslogd machine
>
> Jan 26 10:21:12 monkey20.cluster SysRq :
> Jan 26 10:21:12 monkey20.cluster Show Regs
> Jan 26 10:21:35 monkey20.cluster SysRq :
> Jan 26 10:21:35 monkey20.cluster Show State
> Jan 26 10:21:52 monkey20.cluster SysRq :
> Jan 26 10:21:52 monkey20.cluster Show Memory
>
> Notice the contentless messages, which were the same as on the video
> console. This is a log level issue, change it with dmesg or
>
> [root at monkey20 rc6.d]# echo '9' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 'm' > /proc/sysrq-trigger
>
> and then a pile of memory information shows up on both the syslog side
> and the video console.
>
> The default log level on these machines is 3. If the kernel panics with
> it set to that, will the messages that result be "contentless", like the
> ones above?
Hmmm, we had no kernel panics since we set up netconsole. I also
don't know how much a NIC is affected by a panic.
I tried to find something in the kernel source. At least the panic
message has the log level KERN_EMERG so something should go through.
I guess it is a matter of experience. I'd start with log level 7
which can be reduced any time.
Cheers,
Henning
More information about the Beowulf
mailing list