[eepro100] eepro100 82559 problems
Nate Amsden
subscriptions@graphon.com
Thu, 15 Feb 2001 13:27:56 -0800
hi
after seeing this message posted by Antwerpen@netsquare.org:
http://www.scyld.com/pipermail/eepro100/2001-February/001509.html
i figured i should post because i have a very similar problem.
We have 3 identical 1U systems running Supermicro S370SSE motherboards
(at least im 99.99999% sure it is, i cant be 100% sure without taking
the system apart). They have dual onboard Intel 82559 NICs.
(somewhat related..)
When using OpenBSD 2.8 on one of them, the machine seemed to crash
after about 5 minutes of use(firewaling/port forwarding under
very low load maybe 10kb/s at best).
I have since replaced OpenBSD 2.8 with Debian GNU/Linux 2.2r2 and
kernel 2.2.17+many patches including modules for eepro100 v1.11a.
this machine has been operating perfectly for the past 68 days
20 hours. At another location on the other side of the country
we are trying to deploy the 2nd of 3 systems, using a similar
configuration(kernel and modules are identical, bios settings
match etc) and since we deployed it on monday i think it was
it has consistantly locked up hard every night. Today we
synched the bios settings between the unit here and there and
things seemed to be going better however the errors are still
showing up. something that has never shown up in the logs in
the unit here.
sample log entry:
Feb 10 05:37:11 gate-nh kernel: eth1: Transmit timed out: status 0050 0080 at
59/61 commands 000c0000 400c0000 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: Tx ring dump, Tx queue 61 / 59:
Feb 10 05:37:11 gate-nh kernel: eth1: 0 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 1 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 2 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 3 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 4 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 5 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 6 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 7 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 8 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 9 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 10 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 11 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 12 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 13 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 14 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 15 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 16 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 17 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 18 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 19 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 20 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 21 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 22 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 23 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 24 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 25 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 26 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: * 27 000c0000.
Feb 10 05:37:11 gate-nh kernel: eth1: 28 400c0000.
Feb 10 05:37:11 gate-nh kernel: eth1: =29 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 30 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 31 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1:Printing Rx ring (next to receive into
143).
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 0 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 1 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 2 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 3 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 4 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 5 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 6 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 7 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 8 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 9 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 10 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 11 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 12 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 13 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 14 c0000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 15 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 16 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 17 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 18 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 19 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 20 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 21 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 22 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 23 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 24 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 25 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 26 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 27 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 28 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 28 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 29 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 30 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 31 00000001.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 0 is 3100.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 1 is 782d.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 2 is 02a8.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 3 is 0320.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 4 is 05e1.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 5 is 0021.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 21 is 0000.
Feb 10 05:37:11 gate-nh kernel: eth1: Tx ring dump, Tx queue 61 / 59:
Feb 10 05:37:11 gate-nh kernel: eth1: 0 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 1 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 2 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 3 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 4 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 5 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 6 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 7 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 8 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 9 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 10 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 11 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 12 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 13 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 14 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 15 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 16 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 17 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 18 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 19 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 20 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 21 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 22 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 23 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 24 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 25 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 26 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: * 27 000c0000.
Feb 10 05:37:11 gate-nh kernel: eth1: 28 400c0000.
Feb 10 05:37:11 gate-nh kernel: eth1: =29 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 30 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1: 31 000ca000.
Feb 10 05:37:11 gate-nh kernel: eth1:Printing Rx ring (next to receive into 143)
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 0 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 1 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 2 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 3 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 4 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 5 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 6 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 7 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 8 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 9 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 10 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 11 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 12 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 13 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 14 c0000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 15 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 16 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 17 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 18 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 19 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 20 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 21 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 22 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 23 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 24 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 25 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 26 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 27 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 28 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 29 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 30 00000001.
Feb 10 05:37:11 gate-nh kernel: Rx ring entry 31 00000001.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 0 is 3100.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 1 is 782d.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 2 is 02a8.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 3 is 0320.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 4 is 05e1.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 5 is 0021.
Feb 10 05:37:11 gate-nh kernel: PHY index 1 register 21 is 0000.
I'm not sure what kind of machine was replaced by this one but I
could find out..it was a redhat machine and it ran for about the
past year until we decided to replace it with a racked debian
box. any idea what could cause this? It only happened once we
started using the new system. And I bet the OpenBSD crashes
on my end here were the result of something similar. however,
in OpenBSD it didn't give any errors, it just dumped to the
debugger and sat there until i rebooted it. buggy chip?
buggy driver? hard to imagine the driver is to blame as
this other system has been running for over 2 months without
a single problem.
running ifconfig on both systems shows:
(on broken system)
4:27pm up 5:48, 1 user, load average: 0.00, 0.00, 0.00
eth0 Link encap:Ethernet HWaddr 00:30:48:11:02:D8
inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:417948 errors:0 dropped:0 overruns:0 frame:0
TX packets:184575 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:11 Base address:0xb000
eth1 Link encap:Ethernet HWaddr 00:30:48:11:12:16
inet addr:XX.XX.XX.XX Bcast:XX.255.255.255 Mask:255.255.255.XXX
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:207215 errors:0 dropped:0 overruns:0 frame:0
TX packets:192204 errors:2 dropped:0 overruns:0 carrier:0
collisions:157 txqueuelen:100
Interrupt:5 Base address:0xd000
eth1:0 Link encap:Ethernet HWaddr 00:30:48:11:12:16
inet addr:XX.XX.XX.XXX Bcast:XX.255.255.255 Mask:255.255.255.XXX
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0xd000
(on working system)
1:24pm up 68 days, 20:42, 1 user, load average: 0.00, 0.04, 0.06
eth0 Link encap:Ethernet HWaddr 00:30:48:11:02:D9
inet addr:XX.XX.XX.XX Bcast:XX.XX.XXX.XXX Mask:255.255.255.XXX
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:101475510 errors:0 dropped:0 overruns:0 frame:1
TX packets:117209873 errors:0 dropped:0 overruns:0 carrier:116
collisions:15698167 txqueuelen:100
Interrupt:11 Base address:0x9000
eth0:1 Link encap:Ethernet HWaddr 00:30:48:11:02:D9
inet addr:XX.XX.XX.XXX Bcast:XX.XX.XX.255 Mask:255.255.255.XXX
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:11 Base address:0x9000
eth1 Link encap:Ethernet HWaddr 00:30:48:11:12:17
inet addr:192.168.50.20 Bcast:192.168.50.255 Mask:255.255.255.XX
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:123049598 errors:0 dropped:0 overruns:0 frame:0
TX packets:104000322 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:5 Base address:0xb000
I point this out because eth1 on the broken system had 2 TX errors,
and after all the weeks and packets that have gone through the working
system not a single error. although a lot of collisions, but it is
hooked up to a $10 hub...
here is the kernel log for the broken system when the kernel loaded
the driver:
Feb 15 02:38:45 gate-nh kernel: eepro100.c:v1.11a 7/31/2000 Donald Becker
<becker@scyld.com>
Feb 15 02:38:45 gate-nh kernel: http://www.scyld.com/network/eepro100.html
Feb 15 02:38:45 gate-nh kernel: eth0: OEM i82557/i82558 10/100 Ethernet at
0xc808b000, 00:30:48:11:02:D8, IRQ 11.
Feb 15 02:38:45 gate-nh kernel: Receiver lock-up bug exists -- enabling
work-around.
Feb 15 02:38:45 gate-nh kernel: Board assembly 000000-000, Physical connectors
present: RJ45
Feb 15 02:38:45 gate-nh kernel: Primary interface chip i82555 PHY #1.
Feb 15 02:38:45 gate-nh kernel: General self-test: passed.
Feb 15 02:38:45 gate-nh kernel: Serial sub-system self-test: passed.
Feb 15 02:38:45 gate-nh kernel: Serial sub-system self-test: passed.
Feb 15 02:38:45 gate-nh kernel: Internal registers self-test: passed.
Feb 15 02:38:45 gate-nh kernel: ROM checksum self-test: passed (0x04f4518b).
Feb 15 02:38:45 gate-nh kernel: eth1: OEM i82557/i82558 10/100 Ethernet at
0xc808d000, 00:30:48:11:12:16, IRQ 5.
Feb 15 02:38:45 gate-nh kernel: Receiver lock-up bug exists -- enabling
work-around.
Feb 15 02:38:45 gate-nh kernel: Board assembly a19716-001, Physical connectors
present: RJ45
Feb 15 02:38:45 gate-nh kernel: Primary interface chip i82555 PHY #1.
Feb 15 02:38:45 gate-nh kernel: General self-test: passed.
Feb 15 02:38:45 gate-nh kernel: Serial sub-system self-test: passed.
Feb 15 02:38:45 gate-nh kernel: Internal registers self-test: passed.
Feb 15 02:38:45 gate-nh kernel: ROM checksum self-test: passed (0x04f4518b).
I imagine the same is similar for the working system however the bootup
logs are cycled and overwritten after a month of uptime.
network load on both systems is extremely light, MRTG reports over
the past 5 weeks average network traffic 2.9kB/s both ways for the
broken system. the working one averages 13-14kB/s both ways for
the past 5 weeks. both systems are on 1Mbit dsl connections.
the 3rd is sitting on a shelf waiting for someone to get the time
to set it up. its in another state so i don't have access to it.
The machines themselves are Single P3-733Mhz 128MB ram, using
that Supermicro motherboard, a single 20GB quantum IDE drive.
any ideas would be appreciated :) i have a feeling it will
lockup again tonight.
thanks!
nate
--
Nate Amsden
System Administrator
GraphOn
http://www.graphon.com