[Beowulf] Logging MCE information on next warm boot?
Eric W. Biederman
ebiederm at xmission.com
Mon Jan 25 16:17:07 PST 2010
Greg Keller <Greg at keller.net> writes:
>> Date: Mon, 25 Jan 2010 10:46:31 -0800
>> From: "David Mathog" <mathog at caltech.edu>
>> Subject: [Beowulf] Logging MCE information on next warm boot?
>> To: beowulf at beowulf.org
>> Message-ID: <E1NZTxH-00035U-1F at mendel.bio.caltech.edu>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> Is it possible to have the Machine Check Exception (MCE) information
>> saved to disk automatically on the next warm boot?
>
> David,
>
> I believe the utility you are looking for is mcelog. We usually run it with
> the following arguments:
> /usr/sbin/mcelog -h --ignorenodev --filter
>
> I think it clears the info after it reports it, so make sure to tee it to a
> file. I don't understand the command or the flags, just a copy / paste script
> kiddy in these regards, but I hope it helps.
In the case of a panic this won't work. You would need to setup kdump or
something like that to capture the panic.
This sounds like L1 or L2 cache corruption but I haven't ever had any
machine checks on anything before the k8 core. Wow. You are talking about
old machines.
If machine check registers are kept across reboot there is a reasonable
chance that the firmware clears them.
Eric
More information about the Beowulf
mailing list