[Beowulf] Tyan S2882

Gebhardt Thomas gebhardt at hrz.uni-marburg.de
Fri Sep 29 04:30:52 PDT 2006


thanks for your reply!

On Thursday 28 September 2006 16:02, you wrote:
> I bet if you decode the MCE it will say uncorrectable ECC memory error.

You'd win that bet.

> memtest86 doesn't see correctable memory errors.

As far as I can remember, memtest86 includes tests that also detect
correctable ECC errors.

> It sounds like you have a pile of correctable (soft?) memory errors that
> occasionally become uncorrectable.

Yes, we have. But about 75% of our nodes never showed correctable ECC errors.
And some of them crashed. On the other side we have nodes with a bunch of
correctable ECC errors that have been stable since the first day.

Cheers, Thomas

