[eepro100-bug] i82559er problem

Pavan Sikka pavan.sikka@csiro.au
Fri Aug 9 03:40:01 2002


Background:

We have been using computer boards manufactured by EEPD (www.eepd.com)
that have worked very well for us over the last few years. Recently,
they changed the on-board network chip from the stock i82559 (PCI dev
id 0x1229) to the 82559ER (PCI dev id 0x1209). With this chip, as soon
as the network traffic increased, the network crashed. We finally
pinpointed the problem to out-going traffic and can now recreate it
using flood pinging.

Interestingly, when we put an external NIC containing the stock
i82558/9 chip, things work quite well.

Problem (using version 1.24 of the driver off www.scyld.com):

To demonstrate the problem:

1. On the "problem" machine:
     ping -f -s 1024 <remote-machine>

2. On the remote machine:
    ping -f -s 1024 <problem-machine>

3. Have some nominal random network traffic.

This causes the network to crash in a few seconds.

We change some of the parameters as follows:

Tx/Rx DMA burst long to 127
Tx/Rx Ring size from 32 to 128
Tx Queue size from 12 to 100
Tx Queue unfull from 8 to 100
Tx Timeout from 2*Hz to 4*Hz

With these new parameters, the network survives for a few minutes.

Questions:

1. Are there any known issues with the i82559ER chip ?

2. Given the info below, do you think the eeprom contents are sane ?

3. I have tried very hard to get the software manual for this chip but
have not yet succeeded. I will be happy to sign an NDA but I cant seem
to find anyone in Intel to talk to (I am in Australia). Could you
provide a contact in Intel who could facilitate this ?

4. Any comments at all ?

Here is some detailed info about the chip status using the diagnostic
utilities off www.scyld.com:

./eepro100-diag -f -a -m (before starting the pings, when network is
ok):

eepro100-diag.c:v2.09 7/15/2002 Donald Becker (becker@scyld.com)
  http://www.scyld.com/diag/index.html
Index #1: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0xe400.
i82557 chip registers at 0xe400:
   00000050 0ff30160 00000000 00080002 182541e1 00000600
   No interrupt sources are pending.
    The transmit unit state is 'Suspended'.
    The receive unit state is 'Ready'.
   This status is normal for an activated but idle interface.
Primary transceiver is MII PHY #1. MII PHY #1 transceiver registers:
    3100 782d 02a8 0154 05e1 41e1 0003 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0203 0000 0001 0f9a 0000 0003 21de 0003
    0000 0000 0000 0000 0000 0000 0000 0000.
  Basic mode control register 0x3100: Auto-negotiation enabled.
  Basic mode status register 0x782d ... 782d.
    Link status: established.
    Capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.
    Able to perform Auto-negotiation, negotiation complete.
  Vendor ID is 00:aa:00:--:--:--, model 21 rev. 4.
    No specific information is known about this transceiver type.
  I'm advertising 05e1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD 
10baseT
    Advertising no additional info pages.
    IEEE 802.3 CSMA/CD protocol.
  Link partner capability is 41e1: 100baseTx-FD 100baseTx 10baseT-FD 
10baseT.
    Negotiation  completed.

./eepro100-diag -f -a -m (after the network has crashed):

eepro100-diag.c:v2.09 7/15/2002 Donald Becker (becker@scyld.com)
  http://www.scyld.com/diag/index.html
Index #1: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0xe400.
i82557 chip registers at 0xe400:
   00800050 0ff30f00 00000000 00080002 182541e1 00000358
   No interrupt sources are pending.
    The transmit unit state is 'Suspended'.
    The receive unit state is 'Ready'.
   This status is normal for an activated but idle interface.
  The Command register has an unprocessed command 0080(?!).
Primary transceiver is MII PHY #1. MII PHY #1 transceiver registers:
    3100 7829 02a8 0154 05e1 41e1 0003 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0203 0000 0001 0b1f 0000 0002 13a8 0002
    0000 0000 0000 0000 0000 0000 0000 0000.
  Basic mode control register 0x3100: Auto-negotiation enabled.
  Basic mode status register 0x7829 ... 782d.
    Link status: previously broken, but now reestablished.
    Capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.
    Able to perform Auto-negotiation, negotiation complete.
  Vendor ID is 00:aa:00:--:--:--, model 21 rev. 4.
    No specific information is known about this transceiver type.
  I'm advertising 05e1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD 
10baseT
    Advertising no additional info pages.
    IEEE 802.3 CSMA/CD protocol.
  Link partner capability is 41e1: 100baseTx-FD 100baseTx 10baseT-FD 
10baseT.
    Negotiation  completed.

Some interesting bits from /var/log/messages when the network crashes:

Aug  9 16:32:49 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:32:54 load2 kernel: eth0: Transmit timed out: status 0090 
0080 at 217370/217374 commands 000c0000 000c0000 000c0000.
Aug  9 16:32:54 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:32:54 load2 kernel: eth0: Restarting the chip...
Aug  9 16:32:56 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:02 load2 kernel: eth0: Transmit timed out: status 0050 
0080 at 217540/217543 commands 000c0000 000c0000 400c0000.
Aug  9 16:33:02 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:02 load2 kernel: eth0: Restarting the chip...
Aug  9 16:33:04 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:10 load2 kernel: eth0: Transmit timed out: status 0050 
0080 at 217709/217712 commands 000c0000 000c0000 400c0000.
Aug  9 16:33:10 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:10 load2 kernel: eth0: Restarting the chip...
Aug  9 16:33:12 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:18 load2 kernel: eth0: Transmit timed out: status 0050 
0080 at 217879/217882 commands 000c0000 000c0000 400c0000.
Aug  9 16:33:18 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:18 load2 kernel: eth0: Restarting the chip...
Aug  9 16:33:20 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:26 load2 kernel: eth0: Transmit timed out: status 0050 
0080 at 218045/218048 commands 000c0000 000c0000 400c0000.
Aug  9 16:33:26 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:26 load2 kernel: eth0: Restarting the chip...
Aug  9 16:33:28 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:34 load2 kernel: eth0: Transmit timed out: status 0050 
0080 at 218206/218209 commands 000c0000 000c0000 400c0000.
Aug  9 16:33:34 load2 kernel: Command 80 was not immediately accepted, 
10001 ticks!
Aug  9 16:33:34 load2 kernel: eth0: Restarting the chip...

Output of lspci (very verbose):

00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge 
(rev 03)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort+ >SERR- <PERR+
	Latency: 32
	Region 0: Memory at d0000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [a0] AGP version 1.0
		Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
		Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge 
(rev 03) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 64
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: d4000000-d5ffffff
	Prefetchable memory behind bridge: fff00000-000fffff
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B+

00:07.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 0

00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) 
(prog-if 80 [Master])
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 32
	Region 4: I/O ports at f000 [size=16]

00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) 
(prog-if 00 [UHCI])
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin D routed to IRQ 11
	Region 4: I/O ports at e000 [size=32]

00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02)
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Interrupt: pin ? routed to IRQ 9

00:10.0 Ethernet controller: Intel Corp. 82559ER (rev 09)
	Subsystem: Intel Corp.: Unknown device 000c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort+ >SERR- <PERR-
	Latency: 32 (2000ns min, 14000ns max), cache line size 08
	Interrupt: pin A routed to IRQ 15
	Region 0: Memory at d7020000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at e400 [size=64]
	Region 2: Memory at d7000000 (32-bit, non-prefetchable) [size=128K]
	Expansion ROM at <unassigned> [disabled] [size=1M]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:12.0 Multimedia video controller: Brooktree Corporation Bt878 Video 
Capture (rev 11)
	Subsystem: Unknown device 6000:0311
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 32 (4000ns min, 10000ns max)
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at d7021000 (32-bit, prefetchable) [size=4K]
	Capabilities: [44] Vital Product Data
	Capabilities: [4c] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:12.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture 
(rev 11)
	Subsystem: Unknown device 6000:0311
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Latency: 32 (1000ns min, 63750ns max)
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at d7022000 (32-bit, prefetchable) [size=4K]
	Capabilities: [44] Vital Product Data
	Capabilities: [4c] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: Chips and Technologies: Unknown 
device 0c30 (rev 61) (prog-if 00 [VGA])
	Subsystem: Chips and Technologies: Unknown device 0c30
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping+ SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
	Interrupt: pin A routed to IRQ 15
	Region 0: Memory at d4000000 (32-bit, non-prefetchable) [size=16M]
	Expansion ROM at <unassigned> [disabled] [size=256K]

./eepro100-diag -f -ee

eepro100-diag.c:v2.09 7/15/2002 Donald Becker (becker@scyld.com)
  http://www.scyld.com/diag/index.html
Index #1: Found a Intel 82559ER EtherExpressPro/100+ adapter at 0xe400.
EEPROM contents, size 64x16:
     00: e000 2333 3801 0303 0000 0201 4701 0000
   0x08: 1162 0151 40a0 000c 8086 0000 0000 0000
       ...
   0x30: 002c 0000 0000 0000 0000 0000 0000 0000
   0x38: 0000 0000 0000 0000 0000 0000 0000 5f70
  The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
   Station address 00:E0:33:23:01:38.
   Board assembly 116201-081, Physical connectors present: RJ45
   Primary interface chip i82555 PHY #1.

Thanks,

-Pavan