[Beowulf] Re: ECC Memory and Job Failures (Huw Lynes)

Pfenniger Daniel danielpf at gmail.com
Fri Apr 24 14:17:31 PDT 2009


Prentice Bisbal wrote:
> Gerry Creager wrote:
>> David Mathog wrote:
>>> Huw Lynes <lynesh at cardiff.ac.uk> wrote:
>>>
>>>> http://blog.revolution-computing.com/2009/04/blame-it-on-cosmic-rays.html
>>>>
>>>>
>>>> Apparently someone ran a large cluster job with both ECC and none-ECC
>>>> RAM. They consistently got the wrong answer when foregoing ECC.
>>> There were not very many details given.  I would not rule out the
>>> possibility that the nonECC memory was slightly faulty, and that the
>>> observed errors had nothing to do with gamma rays at all.  A better test
>>> would have been to use the same ECC memory for both tests, and to turn
>>> ECC memory correction on and off in the BIOS.
>> Where's Jim Lux.  I'm sure he's an opinion on this, too...
>>
> 
> Opinion? I think he could write a book on this topic!
> 
> Last time this issue came up, he included links to several papers on
> this topic published by Boeing. As you go up in the atmosphere, the
> [prevalence|probability|concentration] of cosmic rays goes up
> significantly. Boeing has done a lot of research on this topic, since it
> can affect the operation of their [products|weapons].
> 
> 


Once I took a radiation detector (RM-60 from aw-el.com) attached
to an early "netbook" (Atari Portfolio) and recorded the radiation
level on a 11'000 m flight.


 From memory the radiation level in the cabin increased from about
12-17 micro R/hr, a natural radiation level at the ground level,
to over 300 micro R/hr, about 20 times more.


Since the natural level of radiation over a lifetime correponds
to a semi-lethal instantaneous dosis, I would think that for the
crew working years in airplanes the cumulated radiation coming
from cosmic rays may be significant.


	Dan



More information about the Beowulf mailing list