wierd lockups on KNE100TX (DEC DC21142 (rev 65))

Sat Oct 2 19:09:02 1999

We are also have DEC 21142/3 in Adaptec's 6911A/TX boards. We are using tulip
version 91g.   I am always able to hang the card by flooding it with packets using
a
ping -f.

Sep 20 02:37:09 snms0251 kernel:  In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel: eth2: Too much work during an interrupt,
csr5=0xf0670040.
Sep 20 02:37:09 snms0251 kernel: eth2: exiting interrupt, csr5=0xf0680000.
Sep 20 02:37:09 snms0251 kernel: eth2: interrupt  csr5=0xf06988c0 new
csr5=0xf06c0000.
Sep 20 02:37:09 snms0251 kernel:  In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel:  In tulip_rx(), entry 30 00400720.
...
Sep 20 02:37:09 snms0251 kernel:  In tulip_rx(), entry 29 00400720.
Sep 20 02:37:09 snms0251 kernel:  In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel: eth2: Re-enabling interrupts, f06988c0.
Sep 20 02:37:09 snms0251 kernel: eth2: Too much work during an interrupt,
csr5=0xf06988c0.
Sep 20 02:37:09 snms0251 kernel: eth2: exiting interrupt, csr5=0xf0680000.
Sep 20 02:37:10 snms0251 kernel: eth2: 21143 negotiation status 000000c6, MII.
Sep 20 02:37:10 snms0251 kernel: eth2: MII status 786f, Link partner report 41e1.
Sep 20 02:38:10 snms0251 kernel: eth2: 21143 negotiation status 000000c6, MII.
...
Sep 20 02:42:10 snms0251 kernel: eth2: MII status 786f, Link partner report 41e1.
Sep 20 02:42:10 snms0251 kernel: eth2: Tx hung, 5688 vs. 5681.
Sep 20 02:42:10 snms0251 kernel: eth2: Transmit timeout using MII device.

At first I thought it was an issue with enabling the timer interrupt in CSR7
so did:

--- tulip.c     Wed Sep 22 09:25:37 1999
+++ tulip.c.tmp Sat Oct  2 19:01:59 1999
@@ -2724,7 +2724,7 @@
                        /* Acknowledge all interrupt sources. */
                        outl(0x8001ffff, ioaddr + CSR5);
                        /* Clear all interrupting sources, set timer to re-enable.
*/
-                       outl(((~csr5) & 0x0001ebef) | 0x0800, ioaddr + CSR7);
+                       outl(((~csr5) & 0x0001ebef) | 0x8800, ioaddr + CSR7);
                        outl(12, ioaddr + CSR11);
                        break;
                }

This got me a little further but eventually the card would hang.

Finally, as a work around, I made the following modification to the source
and it no longer hangs.

--- tulip.c     Wed Sep 22 09:25:37 1999
+++ tulip.c.fix Sat Oct  2 18:22:35 1999
@@ -2721,11 +2721,13 @@
                        if (tulip_debug > 1)
                                printk(KERN_WARNING "%s: Too much work during an
interrupt, "
                                           "csr5=0x%8.8x.\n", dev->name, csr5);
+#ifdef 0
                        /* Acknowledge all interrupt sources. */
                        outl(0x8001ffff, ioaddr + CSR5);
                        /* Clear all interrupting sources, set timer to re-enable.
*/
                        outl(((~csr5) & 0x0001ebef) | 0x0800, ioaddr + CSR7);
                        outl(12, ioaddr + CSR11);
+#endif
                        break;
                }
        } while (1);

-Leslie Kuczynski

Donald Becker wrote:

> On Fri, 1 Oct 1999, Ruediger Oberhage wrote:
>
> > > > We have a 3com SuperStack II Switch 3900-36 that seems to reboot
> > > > spontaneously.  Whenever it does, it takes down networking on all
> > > > the machines on it that have DEC DC21142 (rev 65).
> >
> > Us, we too have DEC 21142/3 (rev 65)s in Adaptec's 6911A/TX boards.
> > > > when this happens, these machines are no longer able to transfer
> > > > any data. The card is basically locked up.  I have tried [...]
> ..
> > What I find remarkable here is the following: there seems to be a
> > more generic problem with link-loss with this chip and obviously
> > different (and independant) kinds of drivers. The tip to activate
> > re-negotiation, e.g. by pulling the plug, badly fails, at least
> > here for the OPENSTEP driver and our Linux tulip version driver.
> > Thus such a try might actually provoke the "hanging" problem.
>
> Please provoke this behavior and then run 'mii-diag -R' to see if the link
> because usable.
>
> If it does, please provoke the behavior and then send some packets to see if
> you get a transmit timeout message.  If you do, I can put a MII transceiver
> reset in the transmit timeout routine, perhaps conditional on the
> transceiver type.
>
>     if (media_cap[dev->if_port] & MediaIsMII) {
> -       /* Do nothing -- the media monitor should handle this. */
> +       /* Reset to recover from a possible transceiver hang. */
> +       mdio_write(dev, tp->phys[0], 0, 0x8000);
>         if (tulip_debug > 1)
>                 printk(KERN_WARNING "%s: Transmit timeout using MII device.\n",
>                      dev->name);
>
> Donald Becker                                     becker@cesdis.gsfc.nasa.gov
> USRA-CESDIS, Center of Excellence in Space Data and Information Sciences.
> Code 930.5, Goddard Space Flight Center,  Greenbelt, MD.  20771
> 301-286-0882         http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html