[Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
lindahl at pbm.com
Sat Aug 23 16:51:42 PDT 2008
On Wed, Aug 06, 2008 at 02:56:51PM -0500, Jason Clinton wrote:
> We have a tool on our website called "breakin" that is Linux 18.104.22.168
> patched with K8 and K10f Opteron EDAC reporting facilities. It can
> usually find and identify failed RAM in fifteen minutes (two hours at
> most). The EDAC patches to the kernel aren't that great about naming
> the correct memory rank, though.
> Make sure you have multibit (sometimes says 4-bit) ECC enabled in your BIOS.
I just gave this a try, and it seems to be a very nicely packaged
utility. Thanks for making it available. I've used some similar stuff
before, but this is really easy.
More information about the Beowulf