[tulip] eth0: transmit timed out

Jim McCloskey mcclosk@ling.ucsc.edu
Wed Oct 9 03:20:01 2002


Hello. 

With linux kernel version 2.4.18, and the tulip ethernet driver, we
are suddenly seeing here problems that did not occur with earlier
kernel versions but which are of a type that have been reported
before.

Users experience impossibly slow response rates, and the message below
is repeated on the console and in the logs ad nauseam:

Oct  7 21:30:34 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct  7 21:30:34 localhost kernel: eth0: Transmit timed out, status
    fc664010, CSR12 00000000, resetting...

Bringing eth0 down and bringing it up again provides a temporary
solution.

The kernel was compiled and installed early in September and the
problem showed itself today (October 7th) for the first time. This
time-lag is consistent with the earlier reports.

The kernel is hand-compiled, the source downloaded directly from
ftp.kernel.org.

The system is Debian GNU/Linux 3.0 (stable/woody). It's connected
permanently to the University's network. Other machines similarly
connected to the same router are not showing the problem. 

NFS support is disabled in the kernel. The functions served by the
ethernet connection are routine (ssh, mail, web, sendfile).

-------
This is the relevant part of dmesg:

Linux Tulip driver version 0.9.15-pre9 (Nov 6, 2001)
PCI: Found IRQ 10 for device 00:09.0
PCI: Sharing IRQ 10 with 00:04.2
eth0: ADMtek Comet rev 17 at 0xd000, 00:20:78:1F:1E:64, IRQ 10.

-------
/proc/pci reports:

    Bus  0, device   9, function  0:
    Ethernet controller: Linksys Network Everywhere Fast Ethernet 10/100 model NC100 (rev 17).
      IRQ 10.
      Master Capable.  Latency=80.  Min Gnt=255.Max Lat=255.
      I/O at 0xd000 [0xd0ff].
      Non-prefetchable 32 bit memory at 0xe3000000 [0xe30003ff].

-------
tulip-diag -f -a
tulip-diag.c:v2.08 5/15/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a ADMtek AL985 Centaur-P adapter at 0xd000.
ADMtek AL985 Centaur-P chip registers at 0xd000:
 0x00: fff98000 ffffffff ffffffff 07c82000 07c82200 fc664010 ff972117 ffffebff
 0x40: fffe0000 fff0dff8 00000000 fffe0000 00000000 00000200 00000000 c40ffec8
 Extended registers:
 80: 00664010 03fe6bff a04c0005 ffffffff 00000000 07c82240 07c820b0 ffe0f000
 a0: f0000000 1f782000 ffff641e 00000000 40000000 00000000 00000000 00000000
 c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 20000027
 Comet duplex is reported in the MII status registers.
 Transmit started, Receive started, half-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  Comet MAC address registers 1f782000 ffff641e
  Comet multicast filter 0000000040000000.

-------
(All of these describe the state of things after the problem had been
temporarily alleviated by bringing down and then bringing back up the
eth0 interface.)

I've found lots of references to similar-sounding problems, but no
definitive suggestion or solution.

Any help would be greatly appreciated. This machine is in a 
graduate student lab and a lot of people depend on it for their work.

Thanks very much indeed,

Jim McCloskey