[Beowulf] Intel microcode updates for "erratum" - "an incorrect instruction stream may be executed"
tim at buttersideup.com
Mon Jul 26 05:15:18 PDT 2004
I see that Intel has noted an "erratum", which it says may lead to "an
incorrect instruction stream may be executed". The erratum has shown up
on the latest specification update documents, and seems to affect recent
P4s (stepping D1/M0 - versions 0x0f25, 0x0f29), P4 Celerons (stepping D1
- 0x0f29), P4 Xeons (all 400/533MHz bus CPUs, I think), and some recent
XeonMPs (stepping B0/C0 - 0x0f25, 0x0f26).
I was wondering if anyone running large clusters had seen any problems
attributable to this bug - and whether people with heavy vendor support
had been given any advice about it - particularly how often the problem
is likely to occur?
I've noticed that IBM, and HP have released BIOSs for some machines,
which includes new microcode to fix this erratum..
The erratum text follows:
> Problem: A Timing Marginality in the Instruction Decoder Unit May
> Cause an Unpredictable Application Behavior and/or System Hang
> A timing marginality may exist in the clocking of the instruction
> decoder unit which leads to a circuit slowdown in the read path from
> the Instruction Decode PLA circuit. This timing marginality may not be
> visible for some period of time.
> Implication: When this erratum occurs, an incorrect instruction stream
> may be executed resulting in an unpredictable application behavior
> and/or system hang
> Workaround: It is possible for the BIOS to contain a workaround for
> this erratum. BIOS must load the microcode update during the BIOS POST
> time prior to memory initialization. Status: For the steppings
> affected, see the Summary Table of Changes.
The "cpuid" utility - http://www.ka9q.net/code/cpuid/ - will tell you
which version your CPU is, and there is an Intel Microcode update
utility for Linux:
which can be used to load new microcode from user space, but it doesn't
appear to have the latest microcode yet - although it's difficult to
tell given Intel's lack of changelog/release notes for the microcode
files - running the latest microcode.ctl on an IBM machine which has the
fixed BIOS says:
microcode: CPU0 not 'upgrading' to earlier revision 0x17 (current=0x21)
microcode: No suitable data for cpu 0
Intel's erratum text says "BIOS must load the microcode update during
the BIOS POST time prior to memory initialization" - but whether that
means "this is the general policy for microcode updates", or "this is
necessary to fix this particular erratum" is not clear, but hopefully,
it is the former...
More information about the Beowulf