wierd lockups on KNE100TX (DEC DC21142 (rev 65))

Donald Becker becker@cesdis1.gsfc.nasa.gov
Thu Sep 30 12:08:11 1999

On Thu, 30 Sep 1999 scottk@plover.atdesk.com wrote:

> We have a 3com SuperStack II Switch 3900-36 that seems to reboot
> spontaneously.  Whenever it does, it takes down networking on all the
> machines on it that have DEC DC21142 (rev 65).

Note: that's actually a 21143-TD chip.  Digital didn't want to change the
device ID, and kept throwing in (sometimes incompatible) features based on
the revision number.

>  when this happens, these
> machines are no longer able to transfer any data. The card is basically
> locked up.  I have tried stopping and restarting networking, removing and
> re-installing the driver, even warm rebooting fails. The only way to make
> these cards work again, is to power cycle them to allow the hardware to
> reset.  I ran Becker's tulip-diag tool against these cards while they were
> locked up and it thought that they were working..

Wow.  It sounds as if either the transceiver is locking up, or the switch
doesn't recover as long as it sees link beat.

Try the following:
   Unplug and reinstall the cable -- does it now work?
   Do  'mii-diag -R' to reset the transceiver.

Both of these approaches will drop link beat and retrigger autonegotiation.
If either fixes the problem, you still don't know which device is to blame.

I deliberately have not put transceiver reset code into the driver.  There
are cases where an interface is brought down to change a paramter and then
brought immediately back up.  Triggering a 3 second autonegotiation cycle
each time would be a disaster.

> There are other machines on the same switch with DEC DC21140 (rev 34)
> (tulip.c:v0.88) that do not lock up.  

They likely are not doing autonegotation.  They almost certainly have a
different transceiver type.

> We have 3 machines showing this lock-up behavior, all are Kingston
> KNE100TX's.
> They are all using tulip.c:v0.91e 5/27/99.  

The driver version likely has nothing to do with this problem.

> m16 kernel: eth0:  Index #0 - Media MII (#11) described by a 21142 MII
> PHY (3) block. 

> Any suggestions other than replace the NICS and/or the switch?

Does the interface get transmitter timeouts?
What does 'mii-diag' or 'tulip-diag -mm' report when the network link is hung?

Donald Becker					  becker@cesdis.gsfc.nasa.gov
USRA-CESDIS, Center of Excellence in Space Data and Information Sciences.
Code 930.5, Goddard Space Flight Center,  Greenbelt, MD.  20771
301-286-0882	     http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html