Fw: [eepro100] Kernel panic: Aiee, killing interrupt handler!
Problem with 440BX chipset
Kallol Biswas
kallol@efi.com
Mon Jan 21 13:38:01 2002
--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
boundary="------------9BD512C6D70C23D401337DB3"
--------------9BD512C6D70C23D401337DB3
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
--------------9BD512C6D70C23D401337DB3
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<body bgcolor="#FFFFFF">
Hello,
<br> I have found
out the root cause of the kernel crash on our system when the
<br>network traffic is very high. It is with the 440BX chipset.
The bridge puts DMA data at wrong memory location.
<br>So most of the time the size of the memory corruption is in multiple
of system cacheline size.
<br>Since it is a memory corruption problem panics happens
at random place, depending on what gets corrupted.
<p>Thanks to the American Arium device. I wish crash dump utility
would work better in linux.
<p>The workaround is to change the bridge's setting in the BIOS.
<p>Please let me know if your system has a 400BX chipset. The command to
find out is lspci.
<br>
<p>Kallol
<br>
<p>Kallol Biswas wrote:
<blockquote TYPE=CITE>Have you tried putting a differentcard?I guess
you will see the panic again.
<br>We have been having this problem on a celeron based system. The
progress I have made
<br>so far is that for some reason the global variables' address space
is getting corrupted.
<p>One of the panics:
<p>Subject:
<br> Re: [eepro100]
Kernel panic: Aiee, killing interrupt handler!
<br> Date:
<br> Wed, 09 Jan
2002 18:27:19 -0800
<br> From:
<br> Kallol Biswas
<kallol@efi.com>
<br> CC:
<br> gareth.baron@efi.com,
frederic.roussel@efi.com
<br> References:
<br> 1 , 2
<br>
<br>
<br>
<p>Here is a dmesg buffer
<br> from a crash:
<p># Oops: 0000
<br>CPU: 0
<br>EIP: 0010:[<3b3a3938>]
<br>EFLAGS: 00010046
<br>eax: 3b3a3938 ebx: 00000000 ecx: c01b9f24
edx: 00000000
<br>esi: c01b8000 edi: 00003fb9 ebp: c01b9f1c
esp: c01b9f08
<br>ds: 0018 es: 0018 ss: 0018
<br>Process swapper (pid: 0, process nr: 0, stackpage=c01b9000)
<br>Stack: c01b9f24 00000000 c010a3c9 c01184a5 00000000 00001fda c0109fc8
<br>00000000
<br> c01b8000 00000000 c01b8000 00003fb9
00001fda 00000000 00000018
<br>ffff0018
<br> ffffff00 c0107811 00000010 00000246
c01b8000 0009b800 c0106000
<br>0009b800
<br>Call Trace: [<c010a3c9>] [<c01184a5>] [<c0109fc8>] [<c0107811>]
<br>[<c0106000>] [<c0107f9d>] [<c010783a>]
<br> [<c01090b0>] [<c0109068>]
[<c0106000>] [<c0100018>] [<c0106088>]
<br>[<c0106000>] [<c0100175>]
<br>Code: <1>Unable to handle kernel paging request at virtual address
<br>3b3a3938
<br>current->tss.cr3 = 00101000, %cr3 = 00101000
<br>
<p>The tarce is
<p>c010a3c9 : do_IRQ =>>>>>> calls the routine do_bottom half
at this line
<br>c01184a5: do_bottomhalf
<br>c0109fc8:common_interrupt
<br>,.....................................
<p>Panic is due to invalid EIP value 0010:3b3a3938, (note that the EAX
<br>register also has the same value.).
<p>The do_bottom_half routine:
<p>smlinkage void do_bottom_half(void)
<br>{
<br> int cpu = smp_processor_id();
<p> if (softirq_trylock(cpu))
{
<br>
if (hardirq_trylock(cpu)) {
<br>
__sti();
<br>
run_bottom_halves();
<br>
__cli();
<br>
hardirq_endlock(cpu);
<br>
}
<br>
softirq_endlock(cpu);
<br> }
<br>:w
<p>The dump of the routine from object file:
<p>00000000 <do_bottom_half>:
<br> 0: 83 ec 0c
sub $0xc,%esp
<br> 3: 55
push %ebp
<br> 4: 57
push %edi
<br> 5: 56
push %esi
<br> 6: 53
push %ebx
<br> 7: 83 3d 00 00 00 00 00
cmpl $0x0,0x0
<br> e: 75 54
jne 64 <do_bottom_half+0x64>
<br> 10: c7 05 00 00 00 00 01 movl
$0x1,0x0
<br> 17: 00 00 00
<br> 1a: 31 ed
xor %ebp,%ebp
<br> 1c: bf 00 00 00 00
mov $0x0,%edi
<br> 21: 83 3d 00 00 00 00 00 cmpl
$0x0,0x0
<br> 28: 75 32
jne 5c <do_bottom_half+0x5c>
<br> 2a: fb
sti
<br> 2b: 8b 1d 00 00 00 00
mov 0x0,%ebx
<br> 31: 23 1d 00 00 00 00
and 0x0,%ebx
<br> 37: 89 d8
mov %ebx,%eax
<br> 39: f7 d0
not %eax
<br> 3b: ba 00 00 00 00
mov $0x0,%edx
<br> 40: 21 02
and %eax,(%edx)
<br> 42: be 00 00 00 00
mov $0x0,%esi
<br> 47: 90
nop
<br> 48: f6 c3 01
test $0x1,%bl
<br> 4b: 74 04
je 51 <do_bottom_half+0x51>
<br> 4d: 8b 06
mov (%esi),%eax
<br> 4f: ff d0
call *%eax
<br> 51: 83 c6 04
add $0x4,%esi
<br> 54: c1 eb 01
shr $0x1,%ebx
<br> 57: 75 ef
jne 48 <do_bottom_half+0x48>
<br> 59: fa
cli
<br> 5a: 89 f6
mov %esi,%esi
<br> 5c: c7 44 3d 00 00 00 00 movl
$0x0,0x0(%ebp,%edi,1)
<br> 63: 00
<br> 64: 5b
pop %ebx
<br> 65: 5e
pop %esi
<br> 66: 5f
pop %edi
<br> 67: 5d
pop %ebp
<br> 68: 83 c4 0c
add $0xc,%esp
<br> 6b: c3
ret
<p>The machine code from the Emulator after the crash is:
<br>010:C0118454 83 EC 0C 55 57 56 53 83 3D 60 67 1D C0 00 75 54
<br>0010:C0118464 C7 05 60 67 1D C0 01 00 00 00 31 ED BF 60 67 1D
<br>0010:C0118474 C0 83 3D 5C 67 1D C0 00 75 32 FB 8B 1D AC BD 1A
<br>0010:C0118484 C0 23 1D A8 BD 1A C0 89 D8 F7 D0 BA A8 BD 1A C0
<br>0010:C0118494 21 02 BE 80 67 1D C0 90 F6 C3 01 74 04 8B 06 FF
<br>0010:C01184A4 D0 83 C6 04 C1 EB 01 75 EF FA 89 F6 C7 44 3D 00
<br>0010:C01184B4 00 00 00 00 5B 5E 5F 5D 83 C4 0C C3
<p>The machine code for do_bottom_half is not corrupted.
<p>At offset 4f: the instruction call *%eax casues the IP to be loaded
with
<br>3b3a3938 which is the value of eax.
<br>
<p> active = get_active_bhs();
<br> clear_active_bhs(active);
<br> bh = bh_base;
<br> do {
<br>
if (active & 1)
<br>
(*bh)();
<br>
bh++;
<br>
active >>= 1;
<br> } while (active);
<br>}
<p>call *%eax corresponds to (*bh)();
<p>probably the memory at bh_base got corrupted.
<br>
<br>
<p>sam wrote:
<blockquote TYPE=CITE><style></style>
<font face="Arial"><font size=-1>Hello
All</font></font> <font face="Arial"><font size=-1>I did not get any solution
so i am posting again.</font></font> <font face="Arial"><font size=-1>Sam</font></font>
<div style="FONT: 10pt arial">----- Original Message -----
<div style="BACKGROUND: #e4e4e4; font-color: black"><b>From:</b> <a href="mailto:mrjackin@yahoo.co.uk" title="mrjackin@yahoo.co.uk">sam</a></div>
<b>To:</b> <a href="mailto:eepro100@scyld.com" title="eepro100@scyld.com">eepro100@scyld.com</a><b>Sent:</b>
Thursday, January 10, 2002 4:35 AM<b>Subject:</b> [eepro100] Kernel panic:
Aiee, killing interrupt handler!</div>
<font face="Arial"><font size=-1>Hello All,I am gtting this error
when linux machine get more tarffic on NIC</font></font> <font face="Arial"><font size=-1>Kernel
panic: Aiee, killing interrupt handler!</font></font>
<br><font face="Arial"><font size=-1>In interrupt handler not syncing</font></font>
<font face="Arial"><font size=-1>System configuration is</font></font>
<font face="Arial"><font size=-1>P-III 900 MHz512 MB RAM2 100Mbs ONboard
eepro100 NICRedhat-7.1 kernel-2.4.13</font></font> <font face="Arial"><font size=-1>Thanks-Sam</font></font></blockquote>
</blockquote>
</body>
</html>
--------------9BD512C6D70C23D401337DB3--
--------------InterScan_NT_MIME_Boundary--