[Beowulf] Curious about ECC vs non-ECC in practice

Douglas Eadline deadline at eadline.org
Fri May 20 09:35:26 PDT 2011


Joe

While this is somewhat anecdotal, it may be helpful.

Not a large-ish cluster, but as you may guess, I wondered
about this for Limulus
(http://limulus.basement-supercomputing.com)

I wrote a script (will post it if anyone interested)
that runs memtester until you stop it or it finds
a error. I ran it on several Core2 Duo systems
with Kingston DDR2-800 PC2-6400 memory.

As I recall, I ran it on 2-3 systems, only
one showed an error. I stopped the others
after about three weeks. Here is an example of the
script output when it fails (it logs the
memtest output).

  There was an error, inspect memtest-1178
  Start Date was: Mon Apr 20 16:04:35 EDT 2009
  Failure Date was: Fri May  8 17:55:43 EDT 2009
  Test ran 1178 times failing after 1561868 Seconds
  (26031 Minutes or 433 Hours or 18 Days)

My experience in running small clusters
without ECC has been very good. IMO it is also
a question of the quality of the memory vendor.
I never had an issue when running tests and
benchmarks, which I do quite a bit on new
hardware e.g.

  goo.gl/YoBaz

--
Doug





> Hi folks
>
>    Does anyone run a large-ish cluster without ECC ram?  Or with ECC
> turned off at the motherboard level?  I am curious if there are numbers
> of these, and what issues people encounter.  I have some of my own data
> from smaller collections of systems, I am wondering about this for
> larger systems.
>
>    Thanks!
>
> Joe
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics, Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>         http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the Beowulf mailing list