interface dies under network load on SMP machines

Wolfgang Wander wwc@wizard.mit.edu
Thu Aug 27 11:33:49 1998


Hi,

   some of you (Donald?) might still know me from about 1.5 years
ago. Here we go with a new problem with SMP 2.0.35+ (+ meaning also
with Alans pre36 kernels) and tulip.c (up to v0.89K - SMPCHECK
compiled in).

   The interface goes dead from time to time without leaving any log
messages - a simple ifconfig down/up brings them back to live. To
stabilise the systems I've written a small program that checks the
network and restarts the interface if required. So everything is 
nearly perfectly fine...

   However there might be a need for debugging...

Here are the tulip-diag outputs for the Kingston KNE 10/100 cards:

-a:

tulip-diag.c:v1.04 8/10/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Chip Index #1: Found a DC21140 Tulip card at PCI bus 0, device 19 I/O 0xec00.
Digital DS21140 Tulip chip registers at 0xec00:
  ffa04800 ffffffff ffffffff 00fff028 00fff228 fc000102 320e0000 fffe0000
  e0000000 fffd83ff ffffffff fffe0000 ffffff00 ffffffff 1c09fdc0 fffffec8
 The Rx process state is 'Stopped'.
 The Tx process state is 'Stopped'.
Transmit stopped, Receive stopped, half-duplex.
 The transmit threshold is 128.
 Port selection is MII, half-duplex.
EEPROM transceiver/media description for the DC21140 chip.

Leaf node at offset 30, default media type 0800 (Autosense).
 CSR12 direction setting bits 00.
 1 transceiver description blocks:
   MII interface PHY 0 (media type 11).
 MII PHY found at address 1, status 0x782d.


-e:

tulip-diag.c:v1.04 8/10/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Chip Index #1: Found a DC21140 Tulip card at PCI bus 0, device 19 I/O 0xec00.
EEPROM transceiver/media description for the DC21140 chip.

Leaf node at offset 30, default media type 0800 (Autosense).
 CSR12 direction setting bits 00.
 1 transceiver description blocks:
  Media MII,  block type 1.
   MII interface PHY 0 (media type 11).
 MII PHY found at address 1, status 0x782d.

-m:

tulip-diag.c:v1.04 8/10/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Chip Index #1: Found a DC21140 Tulip card at PCI bus 0, device 19 I/O 0xec00.
EEPROM transceiver/media description for the DC21140 chip.

Leaf node at offset 30, default media type 0800 (Autosense).
 CSR12 direction setting bits 00.
 1 transceiver description blocks:
   MII interface PHY 0 (media type 11).
 MII PHY found at address 1, status 0x782d.

The machines in question are dual PII 400 boxes running on Tyan
motherboards.


the ifconfig down dumps the rx/tx status flags as follows when the
interface is dead:

Rx ring 00fff028:  00400320 00660720 004a0320 00d60320 00d60320 004a0320 00d60320 00d60320 004a0320 00d60320 00d60320 00ae0320 00ae0320 00d60320 00ae0320 00d60320 00d60320 00660720 00400320 00400320 00400320 00400320 00400320 00660720 00400320 00660720 00660720 00400320 00400320 00400320 00400320 00400320
Tx ring 00fff228:  7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000 7fff3000

   I hope this is helpful - if not - please let me know and I can dig
deeper...

                  Wolfgang

--      
   _/  _/ _/  _/ _/_/_/           Wolfgang Wander . http://wizard.mit.edu/~wwc/
  _/  _/ _/  _/ _/                       MIT-LNS .         Email: wwc@mit.edu
 _/_/_/ _/_/_/ _/            77 Mass Ave 24-030A.        Tel: (617) 253 5222
_/_/_/ _/_/_/ _/_/_/  Cambridge, MA 02139-4307 .        Fax: (617) 258 6591
---------- fp: 69 35 79 36 E2 E9 69 AC  DE 4A 36 8E 5F AC 2A 2E  ----------