Andrey Savochkin saw@saw.sw.com.sg
Thu, 15 Jun 2000 12:24:01 +0800


On Thu, Jun 15, 2000 at 01:40:54AM +0000, Andrew Morton wrote:
> Well, I didn't say "let's put in lots of bugs" :)
> My point is very simple:
> - Drivers and/or NICs are hanging
> - The hangs are fixed by down+up or rmmod+insmod
> Hence, the hangs _could_ be unhung by appropriate action
> in the tx timeout!
> This would be a great step forward.  A sub-second hiccup and
> a few dropped packets versus a complete system outage.

The main problem is not the action in timeout routine.
The problem is that these routines should be extensively debugged by the
authors/maintainers of all drivers.

TX timeout routine catches cases that shouldn't happen in real life.
It's a redundant code, and it's pity that it's called so often.
It depends on the hardware what actions should be taken in these "impossible"

Speaking about eepro100, I initially thought that the restart of the
transmitter unit is meaningful and sufficient.  When I started to debug the
code, artificially trying to cause TX timeouts, I found that it's not true.
The hardware works in a way that receiver problems leads to TX unit stall
after a short time.  I personally consider it as a hardware bug, but I should
cope with it.  Currently, TX timeout routine does full reset, just like for
interface down and up.

If the current timeout handler fails in its mission, I'm fixing it.
I just need user's patience and help.  I have only one head and two hands to
clatter on keyboard, and I can't fix in a flash.

Andrew, you should consider what is appropriate for your driver, basing on
user's reports.  Other drivers are only hints on what may be done.

Best regards