wierd lockups on KNE100TX (DEC DC21142 (rev 65))
Leslie Kuczynski
lkuczyns@emc.com
Sat Oct 2 19:09:02 1999
We are also have DEC 21142/3 in Adaptec's 6911A/TX boards. We are using tulip
version 91g. I am always able to hang the card by flooding it with packets using
a
ping -f.
Sep 20 02:37:09 snms0251 kernel: In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel: eth2: Too much work during an interrupt,
csr5=0xf0670040.
Sep 20 02:37:09 snms0251 kernel: eth2: exiting interrupt, csr5=0xf0680000.
Sep 20 02:37:09 snms0251 kernel: eth2: interrupt csr5=0xf06988c0 new
csr5=0xf06c0000.
Sep 20 02:37:09 snms0251 kernel: In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel: In tulip_rx(), entry 30 00400720.
...
Sep 20 02:37:09 snms0251 kernel: In tulip_rx(), entry 29 00400720.
Sep 20 02:37:09 snms0251 kernel: In tulip_rx(), entry 30 00400720.
Sep 20 02:37:09 snms0251 kernel: eth2: Re-enabling interrupts, f06988c0.
Sep 20 02:37:09 snms0251 kernel: eth2: Too much work during an interrupt,
csr5=0xf06988c0.
Sep 20 02:37:09 snms0251 kernel: eth2: exiting interrupt, csr5=0xf0680000.
Sep 20 02:37:10 snms0251 kernel: eth2: 21143 negotiation status 000000c6, MII.
Sep 20 02:37:10 snms0251 kernel: eth2: MII status 786f, Link partner report 41e1.
Sep 20 02:38:10 snms0251 kernel: eth2: 21143 negotiation status 000000c6, MII.
...
Sep 20 02:42:10 snms0251 kernel: eth2: MII status 786f, Link partner report 41e1.
Sep 20 02:42:10 snms0251 kernel: eth2: Tx hung, 5688 vs. 5681.
Sep 20 02:42:10 snms0251 kernel: eth2: Transmit timeout using MII device.
At first I thought it was an issue with enabling the timer interrupt in CSR7
so did:
--- tulip.c Wed Sep 22 09:25:37 1999
+++ tulip.c.tmp Sat Oct 2 19:01:59 1999
@@ -2724,7 +2724,7 @@
/* Acknowledge all interrupt sources. */
outl(0x8001ffff, ioaddr + CSR5);
/* Clear all interrupting sources, set timer to re-enable.
*/
- outl(((~csr5) & 0x0001ebef) | 0x0800, ioaddr + CSR7);
+ outl(((~csr5) & 0x0001ebef) | 0x8800, ioaddr + CSR7);
outl(12, ioaddr + CSR11);
break;
}
This got me a little further but eventually the card would hang.
Finally, as a work around, I made the following modification to the source
and it no longer hangs.
--- tulip.c Wed Sep 22 09:25:37 1999
+++ tulip.c.fix Sat Oct 2 18:22:35 1999
@@ -2721,11 +2721,13 @@
if (tulip_debug > 1)
printk(KERN_WARNING "%s: Too much work during an
interrupt, "
"csr5=0x%8.8x.\n", dev->name, csr5);
+#ifdef 0
/* Acknowledge all interrupt sources. */
outl(0x8001ffff, ioaddr + CSR5);
/* Clear all interrupting sources, set timer to re-enable.
*/
outl(((~csr5) & 0x0001ebef) | 0x0800, ioaddr + CSR7);
outl(12, ioaddr + CSR11);
+#endif
break;
}
} while (1);
-Leslie Kuczynski
Donald Becker wrote:
> On Fri, 1 Oct 1999, Ruediger Oberhage wrote:
>
> > > > We have a 3com SuperStack II Switch 3900-36 that seems to reboot
> > > > spontaneously. Whenever it does, it takes down networking on all
> > > > the machines on it that have DEC DC21142 (rev 65).
> >
> > Us, we too have DEC 21142/3 (rev 65)s in Adaptec's 6911A/TX boards.
> > > > when this happens, these machines are no longer able to transfer
> > > > any data. The card is basically locked up. I have tried [...]
> ..
> > What I find remarkable here is the following: there seems to be a
> > more generic problem with link-loss with this chip and obviously
> > different (and independant) kinds of drivers. The tip to activate
> > re-negotiation, e.g. by pulling the plug, badly fails, at least
> > here for the OPENSTEP driver and our Linux tulip version driver.
> > Thus such a try might actually provoke the "hanging" problem.
>
> Please provoke this behavior and then run 'mii-diag -R' to see if the link
> because usable.
>
> If it does, please provoke the behavior and then send some packets to see if
> you get a transmit timeout message. If you do, I can put a MII transceiver
> reset in the transmit timeout routine, perhaps conditional on the
> transceiver type.
>
> if (media_cap[dev->if_port] & MediaIsMII) {
> - /* Do nothing -- the media monitor should handle this. */
> + /* Reset to recover from a possible transceiver hang. */
> + mdio_write(dev, tp->phys[0], 0, 0x8000);
> if (tulip_debug > 1)
> printk(KERN_WARNING "%s: Transmit timeout using MII device.\n",
> dev->name);
>
> Donald Becker becker@cesdis.gsfc.nasa.gov
> USRA-CESDIS, Center of Excellence in Space Data and Information Sciences.
> Code 930.5, Goddard Space Flight Center, Greenbelt, MD. 20771
> 301-286-0882 http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html