[Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Paulo Afonso Lopes pal at di.fct.unl.ptSun Aug 24 04:48:00 PDT 2008
- Previous message: [Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
- Next message: [Beowulf] reboot without passing through BIOS?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> On Wed, Aug 06, 2008 at 02:56:51PM -0500, Jason Clinton wrote: > >> We have a tool on our website called "breakin" that is Linux 2.6.25.9 >> patched with K8 and K10f Opteron EDAC reporting facilities. It can >> usually find and identify failed RAM in fifteen minutes (two hours at >> most). The EDAC patches to the kernel aren't that great about naming >> the correct memory rank, though. >> >> Make sure you have multibit (sometimes says 4-bit) ECC enabled in your >> BIOS. >> >> http://www.advancedclustering.com/software/breakin.html > > I just gave this a try, and it seems to be a very nicely packaged > utility. Thanks for making it available. I've used some similar stuff > before, but this is really easy. > > -- greg > After more than a week of testing I can assert :-) that the cause was poor power, as the UPS was operating outside its envelope. Since I re-distributed the load, moving some nodes to other UPS'es, errors went away. Thanks for all the suggestions, paulo -- Paulo Afonso Lopes | Tel: +351- 21 294 8536 Departamento de Informática | 294 8300 ext.10763 Faculdade de Ciências e Tecnologia | Fax: +351- 21 294 8541 Universidade Nova de Lisboa | e-mail: pal at di.fct.unl.pt 2829-516 Caparica, PORTUGAL
- Previous message: [Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
- Next message: [Beowulf] reboot without passing through BIOS?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
