intermittent crashing of programs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Daniel Kidger Daniel.Kidger at quadrics.comThu Feb 21 09:46:00 PST 2002
- Previous message: external raid arrays
- Next message: external raid arrays
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Donald Becker wrote: >I think of parity errors being connected to NMI as being an obscure >legacy part of the PC architecture, much like the "A20" line being >switched by the keyboard controller. If the backwards compatibility >broke, no one would notice. Nope not legacy - just look for example at any brand new Dell Pentium 4 system with RAMBUS ECC memory. Any 'multibit errors', generate an NMI. Single bit errors in ecc memory get spotted by the BIOS too but the O/S will not be told - since they are corrected 'on-the-fly' by the hardware on reading the data. Hence 'memtest' will never detect these single-bit errors. The other thing to get is 'ecc.o'. This is a kernal module that polls the motherboard chipset every second - it will show in /proc/ram the single and multibit errors and will collate them by memory bank. eg. [dan at fridge8]$ cat /proc/ram Chipset ECC capability : ECC detection and correction Current ECC mode : ECC detection and correction Bank Size Type ECC SBE MBE 0 256M RMBS Y 202758 0 1 256M RMBS Y 0 5 2 256M RMBS Y 0 2 3 256M RMBS Y 0 0 4 256M RMBS Y 0 0 5 256M RMBS Y 0 257 6 256M RMBS Y 0 0 7 256M RMBS Y 0 0 Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com --------------------
- Previous message: external raid arrays
- Next message: external raid arrays
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
