[eepro100] Sudden "Network unreachable" ...

Gerd Aschemann gerd@aschemann.net
Wed Oct 9 16:18:01 2002


Hi,

every now and then my EtherExpressPro100 of my IBM ThinkPad A21p
looses the IP connection (if not ethernet conn.?). I was having this
problem with Kernel 2.4.16 (SuSE 7.3) and have it now again with
2.4.19 (SuSE 8.1). I had hoped it would vanish with the new kernel but
still occurs. Since I have these problems regularly I run two pings
(one to a local address and one to a remote one). If the problem
occurs both of them stop (they stop displaying the information of the
current packet, ie., icmp sequence number, round trip time etc.), ie.,
they don't receive any echo replies anymore. If I plug off and plug on
the TP patch cable it mostly restarts. Sometimes I get no packages
through, sometimes a few, sometimes it works again for hours. Then I
have to re-plug-off+on again. Sometimes I do that five times per
minute. Sometimes the link works for weeks ... Since it is a laptop I
seldom have it linked in the same place for more than a day, but I was
on vacation some weeks before and found the system running fine
when I returned. Instead of pulling the plug I sometimes try switching
off and on the Ethernet Switch, and I have tried different switches.
The problem occurs more frequently where I have three different low
cost switches (a 5port 10/100 one from LevelOne, an 8port 10/100 one
from LevelOne and an 8port 10/100 no name product). At my current
customer it occurs more rarely and I do not know what kind of switch
they have, although I suspect it is only an 10 Mbit system.

Sometimes I get:
  From lap1.aschemann.intranet (192.168.2.3) icmp_seq=35337
  Destination Host Unreachable

Sometimes the ping restarts transmitting without any action ...


However, I have analysed the problem so far:

	# mii-diag -w
	Using the default interface 'eth0'.
	Basic registers of MII PHY #1:  1000 782d 02a8 0154 05e1 45e1 0003 0000.
	 The autonegotiated capability is 01e0.
	The autonegotiated media type is 100baseTx-FD.
	 Basic mode control register 0x1000: Auto-negotiation enabled.
	 You have link beat, and everything is working OK.
	 Your link partner advertised 45e1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD 10baseT, w/ 802.3X flow control.
	   End of basic transceiver information.

	Monitoring the MII transceiver status.
	19:31:20.151  Baseline value of MII BMSR (basic mode status
	register) is 782d.
Then I lost connectivity and pulled the plug after some seconds ...
	19:43:01.313  MII BMSR now 7809:   no link, NWay busy, No Jabber (0000).
	19:43:04.823  MII BMSR now 7829:   no link, NWay done, No Jabber (45e1).
	   New link partner capability is 45e1 0003: 10/100 switch w/ flow control.
	19:43:04.832  MII BMSR now 782d: Good link, NWay done, No Jabber (45e1).
Here again: IP connectivity got lost but link status remaind the same,
so I pulled the plug again:
	19:57:11.032  MII BMSR now 7809:   no link, NWay busy, No Jabber (0000).
	19:57:15.542  MII BMSR now 7829:   no link, NWay done, No Jabber (45e1).
	   New link partner capability is 45e1 0003: 10/100 switch w/ flow control.
	19:57:15.552  MII BMSR now 782d: Good link, NWay done, No Jabber (45e1).

and
	# lspci -vv
	...
	00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 09)
		Subsystem: Intel Corp. EtherExpress PRO/100+ MiniPCI
		Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
		Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
		Latency: 66 (2000ns min, 14000ns max), cache line size 08
		Interrupt: pin A routed to IRQ 11
		Region 0: Memory at f0120000 (32-bit, non-prefetchable) [size=4K]
		Region 1: I/O ports at 1800 [size=64]
		Region 2: Memory at f0100000 (32-bit, non-prefetchable) [size=128K]
		Expansion ROM at <unassigned> [disabled] [size=1M]
		Capabilities: [dc] Power Management version 2
			Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
			Status: D0 PME-Enable- DSel=0 DScale=2 PME-
	 ...

and
	# dmesg
	...
	eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
	eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
	PCI: Found IRQ 11 for device 00:03.0
	PCI: Sharing IRQ 11 with 00:03.1
	eth0: OEM i82557/i82558 10/100 Ethernet, 00:03:47:0F:96:11, IRQ 11.
	  Board assembly 000695-001, Physical connectors present: RJ45
	  Primary interface chip i82555 PHY #1.
	  General self-test: passed.
	  Serial sub-system self-test: passed.
	  Internal registers self-test: passed.
	  ROM checksum self-test: passed (0xdbd8681d).
	...

Since the driver seems to be very old, I tried a newer one:

a) The one (1.25) found at scyld didn't compile: It threw a lot of
   errors, so I postponed my try ...
b) Another version (for 2.4, although the link on
   ftp://ftp.scyld.com/pub/network/eepro100.html points to 2.3) from
   ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.4/:

    This seemed to be promising since it contained a lot more code,
    even some handling some "Receiver lock-up bug exists -- enabling
    work-around.". This one was compilable after some small tweaking (I append the
    patch) but still does not solve my problems ...

Hope there is any help to my problem? Maybe I try to compile the
latest version of the driver from a)!

Regards,
--
Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)


*** eepro100.c.R1.36	Wed Oct  9 21:23:45 2002
--- eepro100.c	Wed Oct  9 22:09:05 2002
***************
*** 99,105 ****
  #include <linux/string.h>
  #include <linux/errno.h>
  #include <linux/ioport.h>
! #include <linux/malloc.h>
  #include <linux/interrupt.h>
  #include <linux/timer.h>
  #include <linux/pci.h>
--- 99,105 ----
  #include <linux/string.h>
  #include <linux/errno.h>
  #include <linux/ioport.h>
! #include <linux/slab.h>
  #include <linux/interrupt.h>
  #include <linux/timer.h>
  #include <linux/pci.h>
***************
*** 131,144 ****
  #define RUN_AT(x) (jiffies + (x))

  /* ACPI power states don't universally work (yet) */
! #ifndef CONFIG_EEPRO100_PM
  #undef pci_set_power_state
  #define pci_set_power_state null_set_power_state
  static inline int null_set_power_state(struct pci_dev *dev, int state)
  {
  	return 0;
  }
! #endif /* CONFIG_EEPRO100_PM */

  #define netdevice_start(dev)
  #define netdevice_stop(dev)
--- 131,144 ----
  #define RUN_AT(x) (jiffies + (x))

  /* ACPI power states don't universally work (yet) */
! #ifndef CONFIG_PM
  #undef pci_set_power_state
  #define pci_set_power_state null_set_power_state
  static inline int null_set_power_state(struct pci_dev *dev, int state)
  {
  	return 0;
  }
! #endif /* CONFIG_PM */

  #define netdevice_start(dev)
  #define netdevice_stop(dev)
***************
*** 515,521 ****
  static int eepro100_init_one(struct pci_dev *pdev,
  		const struct pci_device_id *ent);
  static void eepro100_remove_one (struct pci_dev *pdev);
! #ifdef CONFIG_EEPRO100_PM
  static void eepro100_suspend (struct pci_dev *pdev);
  static void eepro100_resume (struct pci_dev *pdev);
  #endif
--- 515,521 ----
  static int eepro100_init_one(struct pci_dev *pdev,
  		const struct pci_device_id *ent);
  static void eepro100_remove_one (struct pci_dev *pdev);
! #ifdef CONFIG_PM
  static void eepro100_suspend (struct pci_dev *pdev);
  static void eepro100_resume (struct pci_dev *pdev);
  #endif
***************
*** 2132,2138 ****
  	sp->rx_mode = new_rx_mode;
  }
  
! #ifdef CONFIG_EEPRO100_PM
  static void eepro100_suspend(struct pci_dev *pdev)
  {
  	struct net_device *dev = pdev->driver_data;
--- 2132,2138 ----
  	sp->rx_mode = new_rx_mode;
  }
  
! #ifdef CONFIG_PM
  static void eepro100_suspend(struct pci_dev *pdev)
  {
  	struct net_device *dev = pdev->driver_data;
***************
*** 2164,2170 ****
  	sp->flow_ctrl = sp->partner = 0;
  	set_rx_mode(dev);
  }
! #endif /* CONFIG_EEPRO100_PM */

  static void __devexit eepro100_remove_one (struct pci_dev *pdev)
  {
--- 2164,2170 ----
  	sp->flow_ctrl = sp->partner = 0;
  	set_rx_mode(dev);
  }
! #endif /* CONFIG_PM */

  static void __devexit eepro100_remove_one (struct pci_dev *pdev)
  {
***************
*** 2195,2202 ****
  		PCI_ANY_ID, PCI_ANY_ID, },
  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ID1030,
  		PCI_ANY_ID, PCI_ANY_ID, },
! 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82820FW_4,
! 		PCI_ANY_ID, PCI_ANY_ID, },
  	{ 0,}
  };
  MODULE_DEVICE_TABLE(pci, eepro100_pci_tbl);
--- 2195,2202 ----
  		PCI_ANY_ID, PCI_ANY_ID, },
  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ID1030,
  		PCI_ANY_ID, PCI_ANY_ID, },
! /* 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82820FW_4, */
! /* 		PCI_ANY_ID, PCI_ANY_ID, }, */
  	{ 0,}
  };
  MODULE_DEVICE_TABLE(pci, eepro100_pci_tbl);
***************
*** 2206,2212 ****
  	id_table:	eepro100_pci_tbl,
  	probe:		eepro100_init_one,
  	remove:		eepro100_remove_one,
! #ifdef CONFIG_EEPRO100_PM
  	suspend:	eepro100_suspend,
  	resume:		eepro100_resume,
  #endif
--- 2206,2212 ----
  	id_table:	eepro100_pci_tbl,
  	probe:		eepro100_init_one,
  	remove:		eepro100_remove_one,
! #ifdef CONFIG_PM
  	suspend:	eepro100_suspend,
  	resume:		eepro100_resume,
  #endif