[eepro100] Transmit timed out with high Tx load

Andrew Pam xanni@glasswings.com.au
Thu Jan 31 00:18:00 2002


I have a router with six Intel PCI EtherExpress Pro100 adapters,
eth0 through eth5, interrupts as follows:

eth0 IRQ5, eth1 IRQ12, eth2 IRQ10, eth3 IRQ11, eth4 IRQ5, eth5 IRQ12

Output of "lspci":

00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:04.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:04.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:04.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:04.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:07.0 RAID bus controller: CMD Technology Inc PCI0648 (rev 01)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
00:0a.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
00:0b.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
00:0c.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
00:0e.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC AGP (rev 7a

The system is RedHat 7.2 with kernels 2.4.9-7 and 2.4.17.  eth4 is not
presently in use, and IRQ5 is also shared with USB.  eth1,2,3 and eth5
have no problems whatsoever even under fairly heavy load.  eth0 however
constantly has transmit timeouts and errors, regardless of whether the
usb driver module is loaded or not.

With the stock eepro100 driver from kernels 2.4.9 and 2.4.17 (v1.09j-t)
the following errors are logged:

Jan 31 13:23:56 statistix kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 31 13:23:56 statistix kernel: eth0: Transmit timed out: status ffff  ffff at
 9179585/9179613 command 0001a000.
Jan 31 13:23:56 statistix kernel: eepro100: wait_for_cmd_done timeout!

I compiled and installed the latest v1.19 drivers from www.sycld.com
and now get the following errors:

Jan 31 15:48:57 statistix kernel: Command 00ff was not immediately accepted, 100
01 ticks!
Jan 31 15:49:01 statistix kernel: eth0: IRQ 5 is physically blocked! Failing bac
k to low-rate polling.
Jan 31 15:49:11 statistix kernel: eth0: IRQ 5 is still blocked!
Jan 31 15:50:00 statistix kernel: Command 00ff was not immediately accepted, 100
01 ticks!
Jan 31 15:50:01 statistix kernel: eth0: IRQ 5 is still blocked!
Jan 31 15:53:08 statistix kernel: eth0: Transmit timed out: status 0050  0000 at
 985184/985198 commands 000c0000 000c0000 000c0000.
Jan 31 15:53:08 statistix kernel: eth0: Restarting the chip...

Eth0 is transmitting on average 3GiB (24Gib) per day and receiving on average
10Mib per day.  (Due to asymmetric routing, most of the Rx data is on eth3.)

I will probably switch to the Intel e100 driver, but would appreciate
information as to the likelihood of a solution to this problem in the
eepro100 driver.

Regards,
	Andrew Pam
-- 
mailto:xanni@xanadu.net                         Andrew Pam
http://www.xanadu.com.au/                       Chief Scientist, Xanadu
http://www.glasswings.com.au/                   Technology Manager, Glass Wings
http://www.sericyb.com.au/                      Manager, Serious Cybernetics
http://two-cents-worth.com/?105347&EG		Donate two cents to our work!
P.O. Box 477, Blackburn VIC 3130 Australia	Phone +61 401 258 915