D-Link switch and ecc-memory.

Josip Loncaric josip at icase.edu
Wed Jan 17 09:56:55 PST 2001


"Robert G. Brown" wrote:
> 
> [on single bit errors:]  It is still interesting
> that this is a noticable effect, though -- it suggests that radiation is
> actually an important cause and not just an incidental cause.

There are two main sources of errors: cosmic ray soft-error rate (SER)
and radioactive SER.  Cosmic rays have enough energy to penetrate the
DRAM packaging and flip a bit. "They are so energetic that they can
penetrate our atmosphere (equivalent to 13 feet of concrete), and then
penetrate through the ceiling into a multistory building." say Ziegler
et al.  Trace amounts of radioactive isotopes trapped inside DRAM
packaging can also create errors.  See Ziegler et al., "IBM experiments
in soft fails in computer electronics (1978-1994)" at

  http://www.research.ibm.com/journal/rd/ziegl/ziegler.html)

This IBM report says the following regarding the altitude effect:

"T. O'Gorman began the first field test of the cosmic ray SER of chips.
[...] His results showed a distinct altitude-dependent SER, with a SER
increase of more than 10x going from sea level to two miles up (Figure
5)."
 
The good news is that memory reliability has improved a lot.  Cosmic ray
soft error sensitivity per bit decreased 2000-fold between 1983 and 1992
(see the IBM report).  IBM's model predicted cosmic ray soft-error
sensitivity of about 20-30 (fails per (year?) per chip) for 16 Mb chips
made in 1991 (see fig. 12).

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134




More information about the Beowulf mailing list