intermittent crashing of programs
Patrick Geoffray
patrick at myri.com
Thu Feb 21 09:09:11 PST 2002
Donald Becker wrote:
> Could you elaborate? What PCI problems cause a NMI, and on which
> motherboards. You obviously have some first-hand experience with the
> problem. I'm guessing that you have helped many customers debug their
> hardware problems.
I have seen it on x330 and supermicro DLE: the SCSI board would issue a
SERR on the PCI, and it would be translated to a NMI in the system. NMIs
are very hard to debug because it's hard to know what is the source of
these NMIs.
For this specific problem with SCSI, we used a PCI analyser and noticed
the SERR. I am not 100% sure why the SCSI was dying with a SERR, but it
was after the board asked for the bus and was waiting for a long DMA in
progress by another PCI device to finish. Replacing the SCSI card was
the solution in this case.
Patrick
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
More information about the Beowulf
mailing list