[realtek-bug] Another 1.10 bug (more important - with fix)
Fri, 14 Jul 2000 15:21:10 -0700
periodically under reasonable NFS load a random machine
in my farm would stop talking - you could ping it but the
responses were slow - often delayed untill after subsequent
pings were received. You could log in (mine are headless)
with abysmal response times and poke around. Always the
following error was the last thing in the kernel buffer:
eth0: Transmit error, Tx status 400820aa.
(a transmit abort). I found you could unwedge a stuck
machine by pinging it with longish packets (say 10k bytes)
at which time it printed:
eth0: Transmit timeout, status 0d 0000 media 00.
eth0: Tx queue start entry 56110 dirty entry 56106, full.
eth0: Tx descriptor 0 is 000804ae.
eth0: Tx descriptor 1 is 0008042d.
eth0: Tx descriptor 2 is 00082442. (queue head)
eth0: Tx descriptor 3 is 00082441.
eth0: MII #32 registers are: 1000 782d 0000 0000 05e1 40a1 0001 0000.
The problem is in the transmit interrupt service routine's response to the
transmit abort state, in the 1.10 driver line 1066 it does:
outl((TX_DMA_BURST<<8)|0x03000001, ioaddr + TxConfig);
I believe this should be replaced with:
outl((TX_DMA_BURST<<8), ioaddr + TxConfig);
Note the missing constant - the '1' in it - I believe, according to
the chip's docs, this causes the aborted packet to be retransmitted
but further down in the ISR the driver assumes that the packet
is done and discards the buffer and allows the xmt entry to be reused
I think that this is the cause of the hang - the tx timeout clears
this and resets this state. Also the '3' value appears to put
the transmitter into a state where it uses an illegal interframe gap
(another possible cause of problems)
This fixed my problem - it may also fix the mysterious hang other
people have reported - to get to this point I ported a lot of
the rtl8139too.c driver into my linux 2.2 driver (spinlocks,
the BSD fixes, the extra 4-byte problem etc etc) - this was the change
that fixed my problem so there may be other stuff that should be
fixed too - I'm loathe to fork a 3rd set of driver source - Donald,
is it appropriate for me to pass you my annotated source for you
to pick and choose changes from?