Transmit timed out: which driver is best now?

Andrey Savochkin saw@saw.sw.com.sg
Tue Mar 14 21:11:02 2000


Hello,

On Tue, Mar 14, 2000 at 03:30:58PM -0800, Michael J. Rensing wrote:
> I've got a 39 node cluster running, all with the Intel 82588 chip.
> Currently, the systems are running a 2.2.12-20 kernel with the 1.06
> eepro100 drivers. I'm getting the "Transmit timed out / Trying to
> restart the transmitter" problem so many people have been discussing.
> 
> Can anyone tell me what the best (current) solution is?
> a) upgrade to 1.09l
> b) use Intel drivers
> c) use Andrei's drivers (where do I get them)
> d) other solution

I recommend (c) :-)
The necessary changes are incorporated into 2.2.15pre13 and later kernels
in ftp.kernel.org/pub/linux/kernel/people/alan/2.2.15pre/

For 2.3 kernels the driver is available at
ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.3/

> 
> Also, will this really fix things, or could I be risking further
> problems?

My changes address exactly the core reasons of the problems and implement
 - accurate tbusy management without race conditions (except the one forced
   by the hardware design)
 - correct buffer ring refilling (the usual reason for card hangs and
   thus timeouts, happen under high load)
 - correct multicast list setup (avoiding stray pointers in the TX ring)
 - thoroughly tested TX timeout handler to avoid looping timeouts because of
   incomplete reset and reconfiguration.

I haven't heard about more problems with my clone of the driver (except not
clear problems with 82559ER cards and interrupt acknowledgement).  I
personally use this version driver on rather loaded and critical servers.

Best regards
					Andrey V.
					Savochkin
-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-eepro100-request@beowulf.org