Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Not quite Walmart, or, living without ECC?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jim Lux James.P.Lux at jpl.nasa.gov
Mon Nov 26 16:19:59 PST 2007


At 01:15 PM 11/26/2007, Bruno Coutinho wrote:

>I heard that the major source of memory corruption in servers is the 
>memory bus.
>And this becomes worse as you add memory sticks.
>With 8 memory stics that have 8 chips in both sides, you has 128 chips.
>So the main purpose of ECC is correcting bus errors.


This is a real possibility. The raw error rate on the chips is quite low.

Mike Sanor, compatibility and performance manager at Crucial 
Technology, a division of DRAM manufacturer Micron Technology that 
sells memory directly to end users is quoted saying:

ECC is most useful for "servers and precision workstations, but not 
commodity desktops. The reason is simple: The error rate in today's 
consumer-level memory is so low so that for most everyday 
applications, adding ECC is pure overkill. For standard DDR2 memory, 
the error rate is something like 100 soft errors over 1 billion 
device hours. If there are 16 memory devices or chips on a given 
module, that translates to one soft error every 30 years. Even if you 
only have two such DIMMs in a system, that's still less than one 
error for more than the lifetime of the system as a whole.







More information about the Beowulf mailing list