Take any two: motherboard performance, compatibility, value

Bob Drzyzgula bob at drzyzgula.org
Wed Jun 28 15:39:54 PDT 2000


On Wed, Jun 28, 2000 at 10:28:58PM +0200, Jakob Østergaard wrote:
> On Wed, 28 Jun 2000, Josip Loncaric wrote:
> 
> > So we are back to square one, i.e. 440BX SMP motherboards and PC100
> > RAM.  This is Not Good.  High RAM bandwidth is essential, particularly
> > on dual P3/800 machines (faster clock, smaller cache)...
> 
> We bought a new dual 550 PIII at work recently, and ended up using good
> old Asus P2B-D (BX based).  It seemed to be the only real affordable 
> and known-stable solution.

Just one more indication of how badly Intel screwed up with the
whole RDRAM fiasco. Improvements in this arena have just been
stalled for months.

> > BTW, I see that ECC corrects about one single bit error per month in
> > 12GB of RAM.  Our total system will have close to 40GB, so errors could
> > pop up weekly, which is why we need ECC.  
> 
> Are you absolutely certain that ECC RAM on PC hardware actually *corrects*
> bit errors ?
> 
> There was a short discussion on this subject on the linux-kernel list some
> weeks ago, where someone stated that ECC RAM (for PCs) can only *detect* a
> parity error and offer you an NMI when that occurs. Noone seemed to object to
> this.

The last thing I am is an expert on this, but, quoting
Intel's 440BX web page at

  http://developer.intel.com/design/intarch/techinfo/440BX/BX_arch.htm

] The Intel® 440BX AGPset also provides DIMM plug-and-play
] support via Serial Presence Detect (SPD) mechanism using
] the SMBus interface. The 82443BX provides optional
] data integrity features including ECC in the memory
] array. During reads from DRAM, the 82443BX provides
] error checking and correction of the data. The 82443BX
] supports multiple-bit error detection and single-bit error
] correction when ECC mode is enabled and single/multi-bit
] error detection when correction is disabled. During
] writes to the DRAM, the 82443BX generates ECC for the
] data on a QWord basis. Partial QWord writes require a
] read-modify-write cycle when ECC is enabled.

In these PC architectures, I don't think that there is any
ECC generation on-module like there is in some architectures,
there is only sufficient bit storage to allow the chipset
to generate the somewhat-redundant codes and store those.

Whether the motherboard manufacturers, BIOS writers and
operating systems configure the chipset properly to take
advantage of this, or do anything interesting with any
information provided by the chipset is another matter
entirely. I would expect, for example, that the chipset
would raise some sort of alert if a single-bit ECC error
was detected and corrected; certainly the OS would want
to log such an event. Depending on the motherboard, BIOS
and OS, it would certainly be possible to treat such an
alert exactly the same as one would treat a double-bit
error, or a a single-bit error when ECC is turned off,
e.g. NMI. It's also possible, I suppose, that the ECC
generation and detection in the 443BX doesn't work worth
a damn and thus most 440BX designs leave it turned off.
I have no reason to believe this is true, however.

FWIW.

--Bob Drzyzgula




More information about the Beowulf mailing list