GNIC-II atop 2.2.12 panics NFSv3 server

Chris Worley cworley@altatech.com
Thu Oct 7 14:27:49 1999


I'm running Don/Eric's v0.08 hamachi.c (w/ some hand installed patches
to force a 32 bit boards in an ASUS motherboard) with kernel 2.2.12 with
RH6.0's knfsd RPM's and Trond Myklebust's NFSv3 patches for the 2.2.12
kernel (he put out new patches today, which I tried, but didn't fix the
problem).

There seems to be a timing (or resource saturation) issue with NFSv3
server and the hamachi driver.

With two NFSv3 clients feeding off one NFSv3 server over 100BaseT
(Netgear tulip's), the NFS server performs at about 10 to 12MBytes/sec
(best performance I've ever seen from any protocol atop 100BaseT).

When I switch client and server to a GNIC-II interface, kmem_free on the
server goes nuts, and the server eventually panics.

This panic does not occur when I export an IDE disk across NFSv3 and the
GNIC-II's; but I also get really slow data transfer rates (bottlenecked
by the slow IDE junk).

When I export a fast FCAL disk array (using Chris Loveland's qlogicfc
driver; stock in 2.2.12), I can saturate the 100BaseT; but I panic using
GigE.

Here's the output of the panic.  Note that this is taken from a serial
console that truncates at column 80, and sometimes intertwines
messages.  First, I get a zillion messages from kmem_free on the NFS
server:

kmem_free: Either bad obj addr or double free (objp=c5e9a800,
name=size-2048)

Each message has different objp's.  Eventually I get the recursive panic
(I've annotated the call stack with the appropriate routine names in the
first call trace):

Unable to handle kernel NULL pointer dereferenced obj addr or d0
uble free (objp=current->tss.cr3 = 00101000, %cr3 = 00101000
c5e81800, name=s*pde = 00000000
ize-2048)
OctOops: 0002
CPU:    0
EIP:    0010:[<c012209b>]
EFLAGS: 00010292
eax: 0000001b   ebx: c03675a0   ecx: 000003fd   edx: 00000001
esi: c5e24000   edi: c03ef6bc   ebp: c7fb0000   esp: c5b8debc
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 891, process nr: 25, stackpage=c5b8d000)
Stack: c03ef660 c03ef68c c03ef6bc c7fb0000 c017802a c5e4a400 c01703e8
c5e24000
       c03ef660 c017049c c03ef660 c03ef660 c606b080 c018f4e5 c03ef660
c5cf7080
       c0689ca0 c7de9100 c0689ca0 c0171d52 c03ef660 c7fb0000 c7de9100
00000001
Call Trace: [<c017802a>ip_local_deliver] [<c01703e8>kfree_skbmem]
[<c017049c>__kfree_skb] [<c018f4e5>packet_rcv] [<c0171d52>net_bh]
[<        [<c0111497>schedule_timeout] [<c0197020>svc_recv]
[<c015c919>nfsd] [<c010658f>kernel_thread]
Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 08
  8 01:05:14 n9 Aiee, killing interrupt handler
kernel: kmem_freScheduling in interrupt
 free (objp=c5e7current->tss.cr3 = 00101000, %cr3 = 00101000
e000, name=size-*pde = 00000000
2048)
Oct  8 Oops: 0002
CPU:    0
EIP:    0010:[<c01118fe>]
EFLAGS: 00010282
eax: 00000018   ebx: c5b8c000   ecx: 000003fd   edx: 00000001
esi: 00000020   edi: c5b8c000   ebp: c5b8de1c   esp: c5b8ddf8
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 891, process nr: 25, stackpage=c5b8d000)
Stack: 00000020 c5b8c000 00000202 c0118175 c5b8c000 c5b8c000 00000000
00000000
       c5b8c000 c5b8de3c c011858e c5b8de80 00000000 c5b8c000 c61db0c0
00000001
       c5b8c000 c7fb0000 c0108027 0000000b 00000000 c010f3a8 c01d984e
c5b8de80
Call Trace: [<c0118175>] [<c011858e>] [<c0108027>] [<c010f3a8>]
[<c01d984e>] [<       [<c01db382>] [<c017802a>] [<c01703e8>]
[<c017049c>] [<c018f4e5>] [<c0171       [<c0111497>] [<c0197020>]
[<c015c919>] [<c010658f>]
Code: c7 05 00 00 00 00 00 00 00 00 8d 65 d8 5b 5e 5f 89 ec 5d c3
01:05:14 n9 kernAiee, killing interrupt handler

Any ideas?

Thanks,

Chris
 | To unsubscribe, send mail to Majordomo@cesdis.gsfc.nasa.gov, and within the
 |  body of the mail, include only the text:
 |   unsubscribe this-list-name youraddress@wherever.org
 | You will be unsubscribed as speedily as possible.