intermittent crashing of programs

Patrick Geoffray patrick at myri.com
Thu Feb 21 09:09:11 PST 2002


Donald Becker wrote:

> Could you elaborate?  What PCI problems cause a NMI, and on which
> motherboards.  You obviously have some first-hand experience with the
> problem.  I'm guessing that you have helped many customers debug their
> hardware problems.

I have seen it on x330 and supermicro DLE: the SCSI board would issue a 
SERR on the PCI, and it would be translated to a NMI in the system. NMIs 
are very hard to debug because it's hard to know what is the source of 
these NMIs.
For this specific problem with SCSI, we used a PCI analyser and noticed 
the SERR. I am not 100% sure why the SCSI was dying with a SERR, but it 
was after the board asked for the bus and was waiting for a long DMA in 
progress by another PCI device to finish. Replacing the SCSI card was 
the solution in this case.

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------




More information about the Beowulf mailing list