[Beowulf] ECC settings for Opteron 175 + Serverworks HT1000 chipset

Bruce Allen ballen at gravity.phys.uwm.edu
Wed Jan 25 17:41:02 PST 2006


Dear Beowulf list,

Our new cluster nodes (Supermicro H8SSL-i motherboard) have Opteron 175 
CPUs, (unregistered) ECC memory dimms, and a serverworks HT1000 chipset. 
The BIOS offers a number of ECC configuration options.  I would like 
advice about how to set these.  We're running a recent Linux kernel.

My goals are (1) to have logging in syslog that helps identify if a 
particular memory stick is suffering from a lot of ECC errors and (2) to 
ensure that memory errors are corrected to the maximum extent possible 
without too large an impact on system performance.

The BIOS ECC Configuration options are:

   ECC enable (we'll use 'enabled')
   MCA DRAM ECC logging (enable/disabled)
   ECC Chip Kill (enable/disable)
   DRAM Scrub Redirect (enable/disable)
   DRAM BG Scrub (disable/time in NSEC)
   L2 Cache BG Scrub (disable/time in NSEC)
   Data Cache BG Scrub (disable/time in NSEC)

I would appreciate advice about:
   -- how to configure these settings
   -- pointers to relevant AMD/Serverworks documentation
   -- relevant Linux kernel options/modules
   -- anything else relevant/related

Cheers,
 	Bruce



More information about the Beowulf mailing list