[eepro100] wait_for_cmd_done timeout
Tue Mar 5 14:11:01 2002
I am seeing a problem with wait_for_cmd_done that is very similar to timeout
issue that I found on GeoCrawler.
I hope my findings will help resolve this problem. Basically what I have
found is that I run into a simular time out problem when running Samba. If
Samba is not enabled then I don't see this error and the ATM and eepro100
are fine. Below is some fairly detailed output... thought this might help
track down the problem.
In summary: It appears to me that the network is being flooded with ICMP
traffic (and possibly other traffic) and that the eepro100 may not be
handling the errors/traffic. (I'm new to Linux device drivers, so please
bear with me here). The period of the ICMP error messages may, in itself,
not be much traffic... so I assume there may be more to this problem, for
example the two device drivers may be sharing the same tx buffer and/or
memory. Regardless if Samba is running, (which is probably raising the
amount of traffic causing the eepro100 problem to surface), I would like to
fix the eepro100 driver if there is a patch available for it.
Kernel 2.4.9-13 modified to support the ATM device drivers (eni and
ATM on Linux support software: linux-atm-2.4.0
The system runs for some period, usually less than 24 hours, then eventually
the interfaces die with this error (from /var/log/messages)
Mar 5 08:36:46 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 08:39:51 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 08:41:46 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 08:46:46 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 08:51:47 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 08:56:46 sla2 last message repeated 2 times
Mar 5 09:01:46 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 09:03:51 sla2 kernel: 10.12.136.1 sent an invalid ICMP error to a
Mar 5 09:05:14 sla2 kernel: eepro100: wait_for_cmd_done timeout!
Mar 5 09:05:46 sla2 last message repeated 24 times
Mar 5 09:05:48 sla2 last message repeated 3 times
Mar 5 09:05:49 sla2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 5 09:05:49 sla2 kernel: eth0: Transmit timed out: status 0050 0c80 at
48699/48728 command 00030000.
Mar 5 09:05:49 sla2 kernel: eepro100: wait_for_cmd_done timeout!
Mar 5 09:06:21 sla2 last message repeated 22 times
Mar 5 09:06:22 sla2 kernel: eni(itf 0): TX DMA full
Mar 5 09:06:23 sla2 last message repeated 7 times
Mar 5 09:06:23 sla2 kernel: eepro100: wait_for_cmd_done timeout!
Mar 5 09:06:24 sla2 kernel: eni(itf 0): TX DMA full
At this point both the eth0 interface and atm0 interface stop working. Note
that the eepro100 times out first and then the eni driver also dies with TX
DMA full error.
[root@sla2 root]# more ifconfig.txt
atm0 Link encap:UNSPEC HWaddr
inet addr:10.6.160.254 Mask:255.255.255.252
UP RUNNING MTU:1500 Metric:1
RX packets:4860 errors:0 dropped:0 overruns:0 frame:0
TX packets:4860 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:408240 (398.6 Kb) TX bytes:447120 (436.6 Kb)
eth0 Link encap:Ethernet HWaddr 00:50:8B:D3:92:7C
inet addr:216.90.89.xx Bcast:126.96.36.199 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:504622 errors:0 dropped:0 overruns:0 frame:0
TX packets:47444 errors:289 dropped:0 overruns:0 carrier:0
RX bytes:50644863 (48.2 Mb) TX bytes:10479503 (9.9 Mb)
Interrupt:10 Base address:0x2000
Note the collisions are on eth0. I know that the ICMP error above is caused
by our network configuration (Samba broadcasts a NBNS message on
188.8.131.52 and the 10.12.136.1 is replying with the above error.
Ethereal shows the error as Type 3 (Destination Unreachable) and Code 3
(Port Unreachable)). For whatever reason this message is being sent (I've
not been able to determine how to stop Nortel Shasta from doing this yet).
I wanted to point out that the eepro100 is timing out and is effecting the
ATM device driver too.
The eepro100 version is:
"eepro100.c:v1.09j-t 9/29/99 Donald Becker
"eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<email@example.com> and others\n";
I know there is a lot of info here, but after reading the thread on the
wait_for_cmd_done, I thought this might shed some light on the problem and
that it may not be confined to the newer/experimental kernels.
Any help would be much appreciated.