[eepro100-bug] linux-2.0.39 / eepro100 v1.19: transmit timed out errors

Durval Menezes scyld@tmp.com.br
Thu Jan 10 11:02:03 2002


Hello,

We are seeing strange errors while using the eepro100 v1.19 on hosts
running Linux kernel 2.0.39; the machine works as a gateway, and has
two Intel EtherExpresss 10/100B cards in it.

We had to upgrade to v1.19 because the eepro100 v1.05 that came built into
this kernel had problems losing promiscuous mode in the middle of long-term
libpcap sessions (any packet-capture tool, like tcpdump, after some 4-6
hours max would simply stop seeing other machine's packets; stopping the
libpcap application and restarting it again would cure the problem until
it repeated itself).

The problem with v1.19 is that, when under somewhat heavy load (lets say
two streams of 2Mbps each through it, plus others totalling less than
64Kbps), after 1-3 hours, one of the two eepro100 interfaces (eth1) simply
stops responding to packets (ICMP echos, ARPs, etc); the other eepro100
continues to work OK (eth0): the traffic is flowing from the network
directly connected to eth0 to the network directly connected to eth1;
after 10-20 minutes, the thing apparently recovers by itself, only to
stop again after 1-3 hours (we discovered this because we were transfering 
a large quantity of data between machines connected to those networks).

More details:

- at the moment the machine stops responding in eth1 it generates a lot
  of warnings in the logs (see end of this message).

- When the problem is manifesting itself, tcpdumps on the machine won't show
  any packets coming to eth1, even if another machine on the network sees
  the packets.

- After 10-20 minutes the problem magically disappears, only to repeat itself
  after 1-2 hours with the same traffic.

- The problem manifested itself in 100% of our tests: while we were running
  the above-mentioned 2mbps streams, after 1-2 hours the problem ALWAYS 
  occurred.

- we moved back to the v1.05 drivers (actually, restored /lib/modules/2.0.39
  and rebooted) and the problem was fixed: the above streams run for 8 hours
  without any interruptions.

- We even replaced the eth1 card for a brand-new one, but while we had the 
  v1.19 drivers, the problem remained.

So, does anybody has any inkling why this is happening, and how to fix it?
If we can provide any more info or assistance to help solve this problem,
please contact us.

Another question: where can we find the versions of the eepro100 drivers
between 1.05 and 1.19? version 1.05 has the promiscuos-drop problem, but
is rock-solid regarding heavy traffic; some version up to and including
v1.19 has fixed this problem, but then some other version introduced the
transmit-timeout problem; we were wondering if, as an emergency measure,
we could test versions between 1.05 and 1.19 looking for one that does not
have any of those problems... Is there a CVS anywhere? If not, does someone
maintain a eepro100.c,v file, and could email it to me?

Thanks in advance.

Best Regards,
-- 
   Durval Menezes (scyld AT tmp DOT com DOT br, http://www.tmp.com.br/)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=[
Jan  9 15:56:43 dekkeret kernel: eth1: Transmit timed out: status 0090  0000 at 
9561987/9561989 commands 000c0000 400c0000 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1: Tx ring dump,  Tx queue 9561989 / 9561987
:
Jan  9 15:56:43 dekkeret kernel: eth1:   0 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   1 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   2 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1: * 3 000c0000.
Jan  9 15:56:43 dekkeret kernel: eth1:   4 400c0000.
Jan  9 15:56:43 dekkeret kernel: eth1:  =5 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   6 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   7 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   8 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   9 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   10 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   11 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   12 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   13 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   14 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   15 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   16 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   17 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   18 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   19 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   20 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   21 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   22 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   23 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   24 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   25 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   26 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   27 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   28 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   29 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   30 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   31 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:Printing Rx ring (next to receive into 303
77613).
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 0  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 1  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 2  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 3  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 4  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 5  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 6  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 7  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 8  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 9  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 10  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 11  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 12  c0000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 13  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 14  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 15  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 16  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 17  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 18  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 19  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 20  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 21  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 22  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 23  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 24  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 25  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 26  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 27  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 28  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 29  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 30  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 31  00000001.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 0 is 3000.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 1 is 782d.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 2 is 02a8.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 3 is 0150.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 4 is 05e1.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 5 is 0021.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 21 is 0000.
Jan  9 15:56:43 dekkeret kernel: eth1: Restarting the chip...
Jan  9 15:56:43 dekkeret kernel: eth1: Tx ring dump,  Tx queue 9561989 / 9561987
:
Jan  9 15:56:43 dekkeret kernel: eth1:   0 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   1 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   2 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1: * 3 000c0000.
Jan  9 15:56:43 dekkeret kernel: eth1:   4 400c0000.
Jan  9 15:56:43 dekkeret kernel: eth1:  =5 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   6 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   7 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   8 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   9 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   10 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   11 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   12 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   13 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   14 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   15 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   16 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   17 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   18 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   19 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   20 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   21 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   22 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   23 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   24 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   25 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   26 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   27 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   28 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   29 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   30 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:   31 000ca000.
Jan  9 15:56:43 dekkeret kernel: eth1:Printing Rx ring (next to receive into 303
77613).
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 0  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 1  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 2  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 3  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 4  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 5  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 6  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 7  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 8  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 9  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 10  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 11  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 12  c0000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 13  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 14  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 15  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 16  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 17  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 18  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 19  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 20  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 21  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 22  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 23  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 24  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 25  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 26  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 27  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 28  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 29  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 30  00000001.
Jan  9 15:56:43 dekkeret kernel:   Rx ring entry 31  00000001.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 0 is 3000.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 1 is 782d.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 2 is 02a8.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 3 is 0150.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 4 is 05e1.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 5 is 0021.
Jan  9 15:56:43 dekkeret kernel:   PHY index 1 register 21 is 0000.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=]