[eepro100] Power glitch disables interface until machine is UNPLUGGED.

Donald Becker becker@scyld.com
Fri Oct 25 18:37:01 2002


On Tue, 22 Oct 2002, Mike Herrick wrote:

> This weekend we had a power glitch which caused all of the PCs to reboot.
> PCs that we use which had the Tyan S2425 (Tomcat 815ef) motherboard, with
> two eepro100 NICs onboard failed to recover in an interesting way: the PC
> rebooted, but the eth0 interface cannot be used.  The eth1 interface is
> fine.  A soft reboot of the system doesn't clear the problem, a power off/on
> cycle does not clear the problem and a hard reset does not clear the
> problem!

The NIC read an incorrect configuration from the EEPROM which hosed
operation.
Since the NIC is wired for wake-on-LAN, it always gets stand-by power
and never re-reads the EEPROM.

The only solution is to do a hard-off of the power.

> The problem clears up only after UNPLUGGING the machine from the power
> supply.

Yup.

> All three PCs display the same behavior.  After briefly interrupting power
> on one of the PCs about 10 times, I was able to reproduce the problem.
> 
> My questions:
> 1) Is there any way to prevent this type of problem (i.e. BIOS setting,
> EEPROM setting)?
> 2) Assuming I can't prevent it, is there a way to automatically detect
> and recover from the problem (i.e. read an MII register and do some
> type of reset)?

No, it's impossible to recover.  Compare this to the "SLEEP" bug.  The
driver cannot fix the problem, because there is no way to trigger the
chip to re-read basic settings from the EEPROM.  The only solution is to
reprogram the EEPROM and hard-power-cycle the machine.  A frequent
problem is that people don't believe me when I tell them that they need
to do this -- everything else is cleared with soft-off.

> Tyan S2425 motherboard
> According to the manual: one LAN interface is provided via 82599
> [sic?]

Yup, typo.

> controller and the other one via Intel's ICH2 (8252EM) [sic?].
> When I look at the motherboard, I see an 82559 and 82562EM.

Another type.

> eth0: Intel Pro/100 V Network at 0xc8000000, 00:E0:81:20:2B:72, IRQ 7.
>   Board assembly 000000-000, Physical connectors present: RJ45

The on-motherboard NIC, with no assembly ID.

> eth1: Intel i82559 rev 8 at 0xc8002000, 00:E0:81:20:2B:73, IRQ 10.
>   Board assembly 567812-052, Physical connectors present: RJ45
>   Primary interface chip i82555 PHY #1.
>   General self-test: passed.
>   Serial sub-system self-test: passed.
>   Internal registers self-test: passed.
>   ROM checksum self-test: passed (0x04f4518b).
...

> mii-diag -a --force eth0:
> mii-diag.c:v2.05 7/13/2002 Donald Becker (becker@scyld.com)
>  http://www.scyld.com/diag/index.html
>   Using the old SIOCGMIIPHY value on PHY 1 (BMCR 0x0000).
>   No MII transceiver present!.

ERrrrkkkk.
Is this the broken chip?

What does 'eepro100-diag -eemm' report on failure?

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993