intermittent crashing of programs
Patrick Geoffray
patrick at myri.com
Thu Feb 21 07:28:03 PST 2002
Hi Kris,
Kris Thielemans wrote:
> Any suggestions on how we figure out what the problem is (aside from
> replacing all memory chips)? Is it necessarily RAM, or could it be e.g. the
> hard disk controller or so?
It's usually RAM, but it can also be a PCI device whining. I have seen
NMIs from SCSI boards when they were waiting too long to access the PCI
bus for example.
The last time I got one, it was a bad RAM chip and memtest didn't find
anything. Try to swap memory with another node to see if the NMIs
migrate with the chips.
Patrick
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
More information about the Beowulf
mailing list