[Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
Greg Lindahl
lindahl at pbm.com
Sat Aug 23 16:51:42 PDT 2008
On Wed, Aug 06, 2008 at 02:56:51PM -0500, Jason Clinton wrote:
> We have a tool on our website called "breakin" that is Linux 2.6.25.9
> patched with K8 and K10f Opteron EDAC reporting facilities. It can
> usually find and identify failed RAM in fifteen minutes (two hours at
> most). The EDAC patches to the kernel aren't that great about naming
> the correct memory rank, though.
>
> Make sure you have multibit (sometimes says 4-bit) ECC enabled in your BIOS.
>
> http://www.advancedclustering.com/software/breakin.html
I just gave this a try, and it seems to be a very nicely packaged
utility. Thanks for making it available. I've used some similar stuff
before, but this is really easy.
-- greg
More information about the Beowulf
mailing list