[Beowulf] NMI (Non maskable interrupts)
Mark Hahn
hahn at mcmaster.ca
Mon Mar 17 15:02:24 PDT 2008
> From my understanding, NMI is not good since the processors really
> have to handle these interrupts right away and these might degrade the
> performance of the nodes.
I think you're mistaken - NMI's of the sort you're talking about will
result in a panic. these NMI's are probably just low-level kernel
synchronization like where one CPU needs to cause others to immediately do
something like changing the status of a page in their MMUs.
for instance, I notice that more recent kernels classify interrupts
more finely:
[root at experiment ~]# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 68 0 0 0 IO-APIC-edge timer
1: 0 0 0 10 IO-APIC-edge i8042
4: 0 0 0 2 IO-APIC-edge
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 4 IO-APIC-edge i8042
14: 0 0 0 0 IO-APIC-edge ide0
17: 0 0 0 0 IO-APIC-fasteoi sata_nv
18: 0 0 0 0 IO-APIC-fasteoi sata_nv
19: 123229 148 514 4698 IO-APIC-fasteoi sata_nv
362: 127524168 5281605 236961 121506 PCI-MSI-edge eth1
377: 519748 12731137 607115 42573852 PCI-MSI-edge eth0:MSI-X-2-RX
378: 109154 80191 302109913 6487104 PCI-MSI-edge eth0:MSI-X-1-TX
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 300446104 300446082 300446060 300446038 Local timer interrupts
RES: 2698262 44102 2234502 3677120 Rescheduling interrupts
CAL: 4135 4379 4460 415 function call interrupts
TLB: 14018 15088 4079 7251 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
SPU: 0 0 0 0 Spurious interrupts
ERR: 0
I suspect that all the counts listed after RES are, in earlier kernels,
lumped into NMI. obviously, rescheduling, function call and TLB shootdowns
are perfectly normal, not indicating any error (though you might want to
minimize them as well...)
how about trying a new kernel? the above is 2.6.24.3. note that there are
important security fixes that you might be missing if you're running certain
ranges of old kernels...
More information about the Beowulf
mailing list