82558 lock-up bug, even worse ?

Stauffer, Walter w.stauffer@galenica.ch
Tue Dec 7 14:55:41 1999


Fellows,

please let me describe a problem I am having with 3 out of a
series of 7 machines with 82558B ethernet interface on-board.

These machines will be used to help in a therapy and research
project for handicapped children, which I am supporting in my
spare time. Since this is non-profit, I feel free to ask the
list (using my business e-mail account, since I read this one
much more frequently).

On these 3 machines, the net stops working after some 20 or 30
seconds, or sometimes immediately after boot (on the four other
supposedly identical machines, everything works fine).

The driver loads and starts correctly (see below), and so does
the rest of the network.

However, "eepro-diag --aaf" reports the following when the net
goes bad:

  eepro100-diag.c:v1.01 7/8/99 Donald Becker (becker@cesdis.gsfc.nasa.gov)
  Index #1: Found a Intel 82557 EtherExpressPro100B adapter at 0xde00.
  i82557 chip registers at 0xde00:
    00000000 00000000 00000000 00080002 18203000 00000600
    No interrupt sources are pending.
     The transmit unit state is 'Idle'.
     The receive unit state is 'Idle'.
    This status is unusual for an activated interface.

In this state, the machine does not respond to requests from the
outside, which is very bad, because we want to remote-control it.

The net can be brougt to life again for a short time with
"ifconfig eth0 down" and "ifconfig eth0 up", but freezes again
and again (the output of ifconfig looks normal btw).

The following kernel messages appear from time to time, but by
far not as frequent as the net freezes. In addition, the driver
usually cannot restore network functionality.

  kernel: eth0: Transmit timed out: status 0000  0000 at 51/66 command
000c0000.
  kernel: eth0: Trying to restart the transmitter...

I am using SuSE Linux 6.2 and have tried kernels 2.2.10 and 2.2.13
with no improvement on the problem. I also tried both the driver as
a module and compiled into the kernel, with no difference.

I also tried different hubs, switches and cables with both 10 and
100Mb with no difference.

In desperation, I tried NT 4.0 SP5, Win98, and DOS, all with similar
results. Unfortunately, the driver of Win98 manages to recover the
interface when there is network traffic originating at the trouble
machine. Therefore, the vendor of the machines almost refused to
see the problem: when you run Win98 and use MS IE to look at some
Web pages, the machine seems indeed to run. However, when you ping
the machine from outside and it does no network activity by itself,
the ping replies stop very soon.

Short conclusion of a long story: there seems to be some kind of
hardware problem.

The "fine print" on the 82558B chips is the following:

 L909IH17 OK
 L909IH17 OK
 L909IH23 OK
 L907ID72 OK
 L907EB94 bad
 L907EC19 bad
 L903IL73 bad

Can this information be decoded somehow ? Is it possible that
there are series of faulty 82558B's ?

As mentioned before, I am having a hard time bringing the
vendor of the machines to see the problem, therefore, I
would appreciate any of your comments very much.

Best regards,
Walter



This is from /var/log/messages:

Dec  6 07:33:56 Auricula1 kernel: eepro100.c:v1.08 5/3/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Dec  6 07:33:56 Auricula1 kernel: eth0: OEM i82557/i82558 10/100 Ethernet at
0xde00, 00:40:CA:13:72:17, IRQ 10.
Dec  6 07:33:56 Auricula1 kernel:   Board assembly 664088-003, Physical
connectors present: RJ45
Dec  6 07:33:56 Auricula1 kernel:   Primary interface chip i82555 PHY #1.
Dec  6 07:33:56 Auricula1 kernel:   General self-test: passed.
Dec  6 07:33:56 Auricula1 kernel:   Serial sub-system self-test: passed.
Dec  6 07:33:56 Auricula1 kernel:   Internal registers self-test: passed.
Dec  6 07:33:56 Auricula1 kernel:   ROM checksum self-test: passed
(0x24c9f043).
Dec  6 07:33:56 Auricula1 kernel:   Receiver lock-up workaround activated.



This is the normal state, when the interface is running:

eepro-diag -aaf

eepro100-diag.c:v1.01 7/8/99 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Index #1: Found a Intel 82557 EtherExpressPro100B adapter at 0xde00.
i82557 chip registers at 0xde00:
  00000050 022218f0 00000000 00080002 18203000 00000600
  No interrupt sources are pending.
   The transmit unit state is 'Suspended'.
   The receive unit state is 'Ready'.
  This status is normal for an activated but idle interface.