[Beowulf] Not quite Walmart, or, living without ECC?
Jim Lux
James.P.Lux at jpl.nasa.gov
Mon Nov 26 16:19:59 PST 2007
At 01:15 PM 11/26/2007, Bruno Coutinho wrote:
>I heard that the major source of memory corruption in servers is the
>memory bus.
>And this becomes worse as you add memory sticks.
>With 8 memory stics that have 8 chips in both sides, you has 128 chips.
>So the main purpose of ECC is correcting bus errors.
This is a real possibility. The raw error rate on the chips is quite low.
Mike Sanor, compatibility and performance manager at Crucial
Technology, a division of DRAM manufacturer Micron Technology that
sells memory directly to end users is quoted saying:
ECC is most useful for "servers and precision workstations, but not
commodity desktops. The reason is simple: The error rate in today's
consumer-level memory is so low so that for most everyday
applications, adding ECC is pure overkill. For standard DDR2 memory,
the error rate is something like 100 soft errors over 1 billion
device hours. If there are 16 memory devices or chips on a given
module, that translates to one soft error every 30 years. Even if you
only have two such DIMMs in a system, that's still less than one
error for more than the lifetime of the system as a whole.
More information about the Beowulf
mailing list