[eepro100] True on TRANSMIT ERROR TIMEOUT
Mark Cox
mark@idrive.com
Tue, 13 Jun 2000 12:52:38 -0700
>And there are many, many times when a wedged driver can be resurrected
>by a down/up or a rmmod/insmod. This means that the driver _could_ have
>automtically recovered in tx_timeout, but it simply did not do so.
>Can anyone suggest a reason why we _shouldn't_ simply reset the NIC to
>the utmost possible extent in tx_timeout? Restart media selection,
>reinitialise ring buffers, etc, etc?
I currently shell script hang-checks across all of my webservers in the
eventuality that their eepro100 drivers enter an unrecoverable transmit
timeout. It must completely reload the module when a card hangs. The script
catches almost half of the webservers each day --when the machines are never
doing more than 15 megabit each. I have tried tulip drivers on Netgear
adapters with an even less desireable result. I have tried the newest
drivers. I have disabled SMP where applicable. I even tried VALinux machines
hoping that we could get a tried and true combination of drivers and
hardware. It turns out VALinux is aware that they have been shipping
machines with these SAME DAMN PROBLEMS all along. This is infuriating. Is it
Intel's fault? Is it Becker's fault? I really do not care. I just want the
drivers to recover from whatever PCI bus or other issue they have. Is there
not a (beta or otherwise) successfully recovering driver out there? Has
anyone found a better fix than a 'duct tape' hang check that reloads the
module?