wierd lockups on KNE100TX (DEC DC21142 (rev 65))

scottk@atdesk.com scottk@atdesk.com
Mon Oct 25 12:09:09 1999


Donald,

I was unable to provoke the behavior that I was experiencing.  It did
however, just happen this weekend.

I run mii-diag -R on both machines that exhibited the behavior.  One
started working again, and the other didn't.  

These machines appear to have identical cards, are connected to the same
switch, and otherwise act the same.  

I have the following output from both machines after running mii-diag -R ,
if you would like to see any of this, if it will help, I would be happy to
send it.

Output from mii-diag -R
tulip-diag 
tulip-diag -a
tulip-diag -e
tulip-diag -m

Scott

On Sat, 2 Oct 1999, Donald Becker wrote:

> On Fri, 1 Oct 1999, Ruediger Oberhage wrote:
> 
> > > > We have a 3com SuperStack II Switch 3900-36 that seems to reboot
> > > > spontaneously.  Whenever it does, it takes down networking on all
> > > > the machines on it that have DEC DC21142 (rev 65).
> > 
> > Us, we too have DEC 21142/3 (rev 65)s in Adaptec's 6911A/TX boards.
> > > > when this happens, these machines are no longer able to transfer
> > > > any data. The card is basically locked up.  I have tried [...]
> ..
> > What I find remarkable here is the following: there seems to be a
> > more generic problem with link-loss with this chip and obviously
> > different (and independant) kinds of drivers. The tip to activate
> > re-negotiation, e.g. by pulling the plug, badly fails, at least
> > here for the OPENSTEP driver and our Linux tulip version driver.
> > Thus such a try might actually provoke the "hanging" problem.
> 
> Please provoke this behavior and then run 'mii-diag -R' to see if the link
> because usable.
> 
> If it does, please provoke the behavior and then send some packets to see if
> you get a transmit timeout message.  If you do, I can put a MII transceiver
> reset in the transmit timeout routine, perhaps conditional on the
> transceiver type.
> 
>     if (media_cap[dev->if_port] & MediaIsMII) {
> -    	/* Do nothing -- the media monitor should handle this. */
> +    	/* Reset to recover from a possible transceiver hang. */
> +   	mdio_write(dev, tp->phys[0], 0, 0x8000);
> 	if (tulip_debug > 1)
> 		printk(KERN_WARNING "%s: Transmit timeout using MII device.\n",
> 		     dev->name);
> 
> Donald Becker					  becker@cesdis.gsfc.nasa.gov
> USRA-CESDIS, Center of Excellence in Space Data and Information Sciences.
> Code 930.5, Goddard Space Flight Center,  Greenbelt, MD.  20771
> 301-286-0882	     http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html
>