[Beowulf] Not quite Walmart, or, living without ECC?

Tony Travis ajt at rri.sari.ac.uk
Tue Nov 27 13:56:34 PST 2007

David Mathog wrote:
> Tony Travis wrote:
>> Memtest86+ is fine for 'burn-in' tests, but it does not do a realistic 
>> memory stress test under the conditions that normal applications run. 
> Wow, deja vu.  I just remembered we had almost exactly this same
> discussion 2 years ago, in fact I apparently sent you my hacked up
> version of memtester which has delays in it between the write and read
> cycles, to allow it to catch bit fade (due to radiation or whatever).

Hello, David.

Yes, I remember ;-)

> One thing I still don't get though, if memtester is catching memory
> errors which only appear when _other parts of the system are active_
> does replacing the "bad" memory actually cure these problems?  That is,
> if memtest86+ runs cleanly and memtester finds problems, is it really
> the memory which is the issue?

Yes, replacing the faulty memory does fix the problem in the majority of 
cases. However, I've had to replace a couple of faulty CPU's. I do think 
memtester is a much more realistic stress test, but you can't use it to 
test memory exhaustively like you can with memtest86+, so you still need 
to do both tests. I also run memtester randomly as a confidence building 
exercise :-)

Best wishes,

Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687

More information about the Beowulf mailing list