kernel crashes with Linksys Etherfast PCI card, IP Masquerading

Eric Ding ericding@MIT.EDU
Tue Nov 24 11:12:06 1998


Hi all,

I'm running RedHat 5.2 (kernel 2.0.36) with two Linksys Etherfast 10/100
PCI cards on my system, running an IP Masquerading setup.  Over the first
week of its operation, I've seen some major kernel crashes and dumps.  The
cards initialize with the following output:

tulip.c:v0.90 10/20/98 becker@cesdis.gsfc.nasa.gov
eth0: Lite-On 82c168 PNIC at 0xe800, 00 a0 cc 22 f9 c0, IRQ 11.
eth0:  MII transceiver #1 config 3100 status 782d advertising 01e1.
eth1: Lite-On 82c168 PNIC at 0xe400, 00 a0 cc 23 09 71, IRQ 10.
eth1:  MII transceiver #1 config 3100 status 782d advertising 01e1.
eth0: The transmitter stopped!  CSR5 is 2678016, CSR6 812e2002.
eth0: Changing PNIC configuration to half-duplex, CSR6 812e0000.
eth1: The transmitter stopped!  CSR5 is 2678016, CSR6 816e2002.
eth1: Changing PNIC configuration to half-duplex, CSR6 816e0000.

I upped to 0.90 after 0.89H crashed with errors like:

  Warning: kfree_skb passed an skb still on a list (from 00096e6c).

But I am still getting crashes with 0.90.  Here's one of the kernel dumps I
found in /var/log/messages:

general protection: 0000
CPU:    0
EIP:    0010:[<00155b48>]
EFLAGS: 00010286
eax: 00800000   ebx: 02006900   ecx: a016168b
edx: 0a000003
esi: 0300000a   edi: ffffff00   ebp: 000000ff
esp: 00720c04
ds: 0018   es: 0018   fs: 0018   gs: 002b   ss: 0018
Process ls (pid: 419, process nr: 4, stackpage=00720000)
Stack: 001c555c 00648810 00000000 00000000 03000000 ffffff00 001522a1 0300000a 
       00648810 00000000 00720ce0 0000008c 0077813c 0076e067 0011c870 0804fc80 
       00152334 00648810 0065b018 0000008c 00000000 00000000 001c555c 00000010 
Call Trace: [<001522a1>] [<0011c870>] [<00152334>] [<00156f41>] [<0101cf7f>] [<0101c2fc>] [<0101c22c>] 
       [<0015e944>] [<00128af6>] [<001292f2>] [<0015f6c1>] [<0015002e>] [<001211c3>] [<0101c056>] [<0101a2a3>] 
       [<0101e370>] [<0012f5a7>] [<001259ed>] [<00125a5b>] [<00125b8f>] [<0010ab39>] 
Code: 8b 41 38 25 01 00 ff ff 3d 01 00 02 00 75 58 8b 41 58 85 c0 
Problem: block on freelist at 00093cdc isn't free.
Nov 13 16:59:58 jlu last message repeated 190 times

Another one of my dumps gave the following message literally thousands of
times, as far as I can see:

Nov 17 19:16:25 jlu kernel: k on freelist at 0029d808 isn't free.
Nov 17 19:16:25 jlu kernel: Problem: block on freelist at 0029d808 isn't free.
Nov 17 19:16:27 jlu last message repeated 360 times
Nov 17 19:16:27 jlu kernel: k on freelist at 0029d808 isn't free.
Nov 17 19:16:27 jlu kernel: Problem: block on freelist at 0029d808 isn't free.
Nov 17 19:16:27 jlu last message repeated 242 times

/proc/pci says:

PCI devices found:
  Bus  0, device  12, function  0:
    Ethernet controller: Unknown vendor LNE100TX (rev 32).
      Medium devsel.  Fast back-to-back capable.  IRQ 10.  Master Capable.  Latency=80.  
      I/O at 0xe400.
      Non-prefetchable 32 bit memory at 0xf8ffe000.
  Bus  0, device  11, function  0:
    Ethernet controller: Unknown vendor LNE100TX (rev 32).
      Medium devsel.  Fast back-to-back capable.  IRQ 11.  Master Capable.  Latency=80.  
      I/O at 0xe800.
      Non-prefetchable 32 bit memory at 0xf8fff000.
  Bus  0, device  10, function  0:
    VGA compatible controller: Cirrus Logic GD 5434 (rev 142).
      Fast devsel.  IRQ 9.  
      Non-prefetchable 32 bit memory at 0xf9000000.
  Bus  0, device   5, function  0:
    Host bridge: Silicon Integrated Systems 85C496 (rev 0).
      Medium devsel.  Fast back-to-back capable.  Master Capable.  No bursts.  

And ./tulip-diag, when given the explicit I/O port addresses, says:

[root@jlu ericding]# ./tulip-diag -f -e -e -a -m -m -p 0xe400
tulip-diag.c:v1.06 9/18/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Digital DC21040 Tulip Tulip chip registers at 0xe400:
  00004800 01ff0000 42422600 0040b820 0040ba20 02660010 816e2002 0001ebef
  00000000 00000000 0040bab0 00093cf4 00000025 00000000 00000000 10000001
 The Rx process state is 'Waiting for packets'.
 The Tx process state is 'Idle'.
Transmit started, Receive started, half-duplex.
 The transmit unit is set to store-and-forward.
 Port selection is MII 100baseTx scrambler, half-duplex.
EEPROM contents:
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
 ID CRC 0xe3 (vs. 00), complete CRC 3b59d4af.
 ***WARNING***: No MII transceivers found!
[root@jlu ericding]# ./tulip-diag -f -e -e -a -m -m -p 0xe800
tulip-diag.c:v1.06 9/18/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Digital DC21040 Tulip Tulip chip registers at 0xe800:
  00004800 01ff0000 10450008 0040b028 0040b228 02660010 812e2002 0001ebef
  00000000 00000000 0040b2c8 00096e3c 00000024 00000000 00000000 10000001
 The Rx process state is 'Waiting for packets'.
 The Tx process state is 'Idle'.
Transmit started, Receive started, half-duplex.
 The transmit unit is set to store-and-forward.
 Port selection is MII 100baseTx scrambler, half-duplex.
EEPROM contents:
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
  0000 0000 0000 0000 0000 0000 0000 0000
 ID CRC 0xe3 (vs. 00), complete CRC 3b59d4af.
 ***WARNING***: No MII transceivers found!

So am I experiencing bugs with the tulip driver, or IP Masquerade, or am I
barking up the wrong trees altogether?

Thanks,
Eric
--
1105 Lexington St. Apt. 2-10                             [H] (781) 647-3411
Waltham, MA 02452-7200                 <><          [O] (508) 870-0300 x284
http://www.mit.edu/~ericding/                     [cellular] (781) 910-DING
**** Proverbs 18:22 **** Isaiah 64:4 **** 2 Chron. 16:9 **** Titus 3:5 ****