intermittent crashing of programs

Patrick Geoffray patrick at myri.com
Thu Feb 21 07:28:03 PST 2002


Hi Kris,

Kris Thielemans wrote:
> Any suggestions on how we figure out what the problem is (aside from
> replacing all memory chips)? Is it necessarily RAM, or could it be e.g. the
> hard disk controller or so?

It's usually RAM, but it can also be a PCI device whining. I have seen 
NMIs from SCSI boards when they were waiting too long to access the PCI 
bus for example.

The last time I got one, it was a bad RAM chip and memtest didn't find 
anything. Try to swap memory with another node to see if the NMIs 
migrate with the chips.

Patrick
----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------




More information about the Beowulf mailing list