[Beowulf] ECC exerciser/exorciser?

Greg Lindahl lindahl at pbm.com
Mon Jan 26 13:53:10 PST 2009


On Mon, Jan 26, 2009 at 10:30:50AM -0500, Mark Hahn wrote:

> - first, how would you go about setting a threshold for how high is an
> acceptable CE count?  we by default are using the mce module, which by  
> default polls at 1Hz.  my thinking is that if we get overflow events
> (the multiple error bit is set), then it's too fast.

The number should be about zero of these events, if you're near sea
level. Almost all of my 100s of 32 gbyte systems show no MCEs.

At significant altitude (5000+ feet), I don't know the current number
for this generation of memory, but it's probably << 1/week/system.

I'm curious about the comments that indicate that the "burnin" CD's
HPL isn't as good as running HPL yourself. Very odd.

And if you're going to use stream or other programs for testing, do
keep in mind that loading down all the cores seems to be very
important for causing problems.

-- greg





More information about the Beowulf mailing list