[vortex-bug] eth1: transmit timed out, tx_status 00 status e000

Sat, 08 Sep 2001 19:23:07 +0200

Hello,

I seem to have similar problems with a PCI 3c905 100BaseTX [Boomerang] 
(running at 10 MBit/s though) on a Suse 7.0 / 2.2.16 system. 
It looks like the two machines in our office are connected to a 
hub (or at least something which
behaves like a hub) -- with tcpdump I can see the packets for one
machine
on the other and vice versa. If transfer large files from the
non-problematic
machine, the other machine, I get transmit hangs:

Sep  8 04:04:07 rsl3eth12 kernel: eth0: transmit timed out, tx_status 00
status e000.
Sep  8 04:04:07 rsl3eth12 kernel:   Flags; bus-master 1, full 1; dirty
87268 current 87284.
Sep  8 04:04:07 rsl3eth12 kernel:   Transmit list 00000000 vs. c07bb240.
Sep  8 04:04:07 rsl3eth12 kernel:   0: @c07bb200  length 800000aa status
000000aa               
...
Sep  8 04:04:33 rsl3eth12 kernel:   15: @c07bb2f0  length 8000002a
status 0000002a
Sep  8 04:04:33 rsl3eth12 kernel: eth0: Resetting the Tx ring
pointer.                          

Looking through the file 3c59x.c, I found some suspicous
statements like:

  outw(TxReset, ioaddr + EL3_CMD);
  for (i = 2000; i >= 0 ; i--)
      if ( ! (inw(ioaddr + EL3_STATUS) & CmdInProgress))
         break;

Just out of curiosity, I printed out the number of loop iterations
used (maybe with today's fast machines, 200 or 2000 is not enough
any more), however it seems that the break is executed already
after one iteration.

running mii-diag -v while the transmit timeout phase seems 
to wait for the transmit timeouts to end..

I ran vortex-diag -aaee during normal operation and during
the timeouts. Here's the diff (which looks though very similar
between two files produced during the hangs):

diff good bad
8c8
<   Window 1: FIFO FIFO 0000 2000 8000 000e 134c 2000.
---
>   Window 1: FIFO FIFO 0000 2000 8000 00ff 13fc 2000.
10,11c10,11
<   Window 3: 02d8 0163 0000 0000 e040 0bff 134f 6000.
<   Window 4: 0000 06d0 0000 0ec0 0003 9822 0100 8000.
---
>   Window 3: 02d8 0163 0000 0000 e040 0bff 13ff 6000.
>   Window 4: 0000 06d0 0000 0cc0 0003 8822 0000 8000.
13,14c13,14
<   Window 6: 0000 0000 0000 6200 1000 5f56 199a c000.
<   Window 7: b2c8 007b 0000 0000 8000 0023 5000 e000.
---
>   Window 6: 0000 0000 0000 0300 1000 00b4 0000 c000.
>   Window 7: 11f8 0630 0000 0000 8000 00ff 5000 e000.
17,18c17,18
<   0xCC20: 00000021 00000000 00b156c2 06000070
<   0xCC30: 00000000 0000ca1f 007bb130 00000000
---
>   0xCC20: 00000021 063012b0 0ced0012 ff0005d2
>   0xCC30: 00000000 0000b482 06301000 00000000  

To me it looks like the problematic machine doesn't get
any chance to access the link (although I though this
should be prevented by the CSMA/CD..). The other machine
(which works without any problems) has a 
3c905C-TX [Fast Etherlink] (rev 74).

thanks for any hints,

André

> > Aaron Baird wrote: 
> > 
> > I have a firewall that is running RedHat 7.1, 2.4.2 kernel, iptables, 
> > and two 3com 3c905b network cards. We have a 4 workstation LAN that 
> > uses this firewall as the main gateway (and a Cisco router as the 
> > actual gateway). About every 24 hours or whenever I do a large, high-speed, network 
> > transfer, I get an error message (shown below) that renders the firewall useless until I do a reboot 
> > (ctl+alt+delete). And the strange thing is, it always occurs on eth1, never on eth0 (eth1 is connected 
> > to a public switch that the router is also connected to). 
> > 
> > Error Message (from log): 
> > 
> > Jul 24 13:42:38 castor kernel: NETDEV WATCHDOG: eth1: transmit timed 
> > out 
> > Jul 24 13:42:38 castor kernel: eth1: transmit timed out, tx_status 00 
> > status e000. 
> > Jul 24 13:42:38 castor kernel: diagnostics: net 0cda media 8880 dma 
> > 000000a0. 
> > Jul 24 13:42:38 castor kernel: Flags; bus-master 1, dirty 61401(9) 
<snip>