[Beowulf] Memory Testing?

Michael Di Domenico mdidomenico4 at gmail.com
Tue Aug 9 05:46:13 PDT 2011

The last discussion on the list about faulty memory surronded using
some software like memtest or hpl to trigger SBE.

I'm curious if anyone has any experience with ECC uncorrectable errors
(specifically not the identification of), but which specific dimm in
the chassis it's pointing to.

The mcelog in linux doesn't seem to report the dimm slot correctly on
my supermicro boards.

The only way i know how to narrow it down is to pull all the dimms,
and then test one at a time, with the system.

I'm curious if there is a better way, or if anyone has any opinions on
the below (or another similar) piece of hardware that might do the


