[tulip] Re: True on TRANSMIT ERROR TIMEOUT
Andrey Savochkin
saw@saw.sw.com.sg
Thu, 15 Jun 2000 12:24:01 +0800
Hello,
On Thu, Jun 15, 2000 at 01:40:54AM +0000, Andrew Morton wrote:
> Well, I didn't say "let's put in lots of bugs" :)
>
> My point is very simple:
>
> - Drivers and/or NICs are hanging
> - The hangs are fixed by down+up or rmmod+insmod
>
> Hence, the hangs _could_ be unhung by appropriate action
> in the tx timeout!
>
> This would be a great step forward. A sub-second hiccup and
> a few dropped packets versus a complete system outage.
The main problem is not the action in timeout routine.
The problem is that these routines should be extensively debugged by the
authors/maintainers of all drivers.
TX timeout routine catches cases that shouldn't happen in real life.
It's a redundant code, and it's pity that it's called so often.
It depends on the hardware what actions should be taken in these "impossible"
cases.
Speaking about eepro100, I initially thought that the restart of the
transmitter unit is meaningful and sufficient. When I started to debug the
code, artificially trying to cause TX timeouts, I found that it's not true.
The hardware works in a way that receiver problems leads to TX unit stall
after a short time. I personally consider it as a hardware bug, but I should
cope with it. Currently, TX timeout routine does full reset, just like for
interface down and up.
If the current timeout handler fails in its mission, I'm fixing it.
I just need user's patience and help. I have only one head and two hands to
clatter on keyboard, and I can't fix in a flash.
Andrew, you should consider what is appropriate for your driver, basing on
user's reports. Other drivers are only hints on what may be done.
Best regards
Andrey