[Beowulf] Logging MCE information on next warm boot?

Eric W. Biederman ebiederm at xmission.com
Mon Jan 25 16:17:07 PST 2010


Greg Keller <Greg at keller.net> writes:

>> Date: Mon, 25 Jan 2010 10:46:31 -0800
>> From: "David Mathog" <mathog at caltech.edu>
>> Subject: [Beowulf] Logging MCE information on next warm boot?
>> To: beowulf at beowulf.org
>> Message-ID: <E1NZTxH-00035U-1F at mendel.bio.caltech.edu>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> Is it possible to have the Machine Check Exception (MCE) information
>> saved to disk automatically on the next warm boot?
>
> David,
>
> I believe the utility you are looking for is mcelog.  We usually run  it with
> the following arguments:
> /usr/sbin/mcelog -h --ignorenodev --filter
>
> I think it clears the info after it reports it, so make sure to tee it  to a
> file.  I don't understand the command or the flags, just a copy /  paste script
> kiddy in these regards, but I hope it helps.

In the case of a panic this won't work.  You would need to setup kdump or
something like that to capture the panic.

This sounds like L1 or L2 cache corruption but I haven't ever had any
machine checks on anything before the k8 core.  Wow.  You are talking about
old machines.

If machine check registers are kept across reboot there is a reasonable
chance that the firmware clears them.

Eric



More information about the Beowulf mailing list