[Beowulf] Logging MCE information on next warm boot?

Henning Fehrmann henning.fehrmann at aei.mpg.de
Wed Jan 27 00:29:58 PST 2010


Hi David,

On Tue, Jan 26, 2010 at 10:46:40AM -0800, David Mathog wrote:
> > David Mathog wrote:
> > Will a logger
> > message for "kern" test it, or is there some other way to force a
> > printk? I'm afraid the logger method might look like it is working, but
> > just go through the usual syslog channels instead of netconsole.
> 
> Too optimistic. With netconsole (supposedly) running on the local node
Correct.
> 
>   logger -p kern.err "test from me"
> 
> only shows up in the log file on that node.  No chance of confusion ;-).
>  There is no explicit network logging of kern.err in /etc/syslog.conf,
> since I figured syslog is never going to be able to actually log
> anything after a kernel error.
> 
> dmesg shows that netconsole started and thinks it is working:
> 
> netconsole: local port 6666
> netconsole: local IP 192.168.1.20
> netconsole: interface eth0
> netconsole: remote port 514
> netconsole: remote IP 192.168.1.220
> netconsole: remote ethernet address 00:30:48:59:f8:ff
> console [netcon0] enabled
> netconsole: network logging started
> 
> However, absolutely nothing comes over netconsole when a node reboots. 
> 
> Searched a lot and finally found out how to test netconsole:
> 
> [root at monkey20 rc6.d]# echo 'p' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 't' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 'm' > /proc/sysrq-trigger
> 
> and it generated these on the syslogd machine
> 
> Jan 26 10:21:12 monkey20.cluster SysRq : 
> Jan 26 10:21:12 monkey20.cluster Show Regs 
> Jan 26 10:21:35 monkey20.cluster SysRq : 
> Jan 26 10:21:35 monkey20.cluster Show State 
> Jan 26 10:21:52 monkey20.cluster SysRq : 
> Jan 26 10:21:52 monkey20.cluster Show Memory 
> 
> Notice the contentless messages, which were the same as on the video
> console.  This is a log level issue, change it with dmesg or
> 
> [root at monkey20 rc6.d]# echo '9' > /proc/sysrq-trigger
> [root at monkey20 rc6.d]# echo 'm' > /proc/sysrq-trigger
> 
> and then a pile of memory information shows up on both the syslog side
> and the video console.
> 
> The default log level on these machines is 3.  If the kernel panics with
> it set to that, will the messages that result be "contentless", like the
> ones above?

Hmmm, we had no kernel panics since we set up netconsole. I also 
don't know how much a NIC is affected by a panic. 
I tried to find something in the kernel source. At least the panic
message has the log level KERN_EMERG so something should go through.

I guess it is a matter of experience. I'd start with log level 7
which can be reduced any time. 

Cheers,
Henning



More information about the Beowulf mailing list