[Beowulf] Not quite Walmart, or, living without ECC?

Jim Lux James.P.Lux at jpl.nasa.gov
Mon Nov 26 16:19:59 PST 2007

At 01:15 PM 11/26/2007, Bruno Coutinho wrote:

>I heard that the major source of memory corruption in servers is the 
>memory bus.
>And this becomes worse as you add memory sticks.
>With 8 memory stics that have 8 chips in both sides, you has 128 chips.
>So the main purpose of ECC is correcting bus errors.

This is a real possibility. The raw error rate on the chips is quite low.

Mike Sanor, compatibility and performance manager at Crucial 
Technology, a division of DRAM manufacturer Micron Technology that 
sells memory directly to end users is quoted saying:

ECC is most useful for "servers and precision workstations, but not 
commodity desktops. The reason is simple: The error rate in today's 
consumer-level memory is so low so that for most everyday 
applications, adding ECC is pure overkill. For standard DDR2 memory, 
the error rate is something like 100 soft errors over 1 billion 
device hours. If there are 16 memory devices or chips on a given 
module, that translates to one soft error every 30 years. Even if you 
only have two such DIMMs in a system, that's still less than one 
error for more than the lifetime of the system as a whole.

More information about the Beowulf mailing list