Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Barcelona hardware error: how to detect

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Lindahl lindahl at pbm.com
Thu Jun 5 11:30:20 PDT 2008


On Thu, Jun 05, 2008 at 10:09:58PM +0400, Mikhail Kuzminsky wrote:

> This was interesting for me also, because I 
> have no information how this hardware problem may be affected in the 
> "real life". 

I have 4 chips with the bug, in 2 servers. I see about 1 lockup per
month with my workload, which doesn't include any VMs. (VMs are
reputed to trigger the bug quickly.) I found a webpage with the
details, and indeed this is what I see:

| The system may experience a machine check event reporting an L3
| protocol error has occurred. In this case, the MC4 status register
| (MSR 0000_0410) will be equal to B2000000_000B0C0F or
| BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be
| equal to 26h.'

-- greg






More information about the Beowulf mailing list