Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

intermittent crashing of programs

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Patrick Geoffray patrick at myri.com
Thu Feb 21 07:28:03 PST 2002


Hi Kris,

Kris Thielemans wrote:
> Any suggestions on how we figure out what the problem is (aside from
> replacing all memory chips)? Is it necessarily RAM, or could it be e.g. the
> hard disk controller or so?

It's usually RAM, but it can also be a PCI device whining. I have seen 
NMIs from SCSI boards when they were waiting too long to access the PCI 
bus for example.

The last time I got one, it was a bad RAM chip and memtest didn't find 
anything. Try to swap memory with another node to see if the NMIs 
migrate with the chips.

Patrick
----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------




More information about the Beowulf mailing list