[eepro100] eepro100 82559 problems

Nate Amsden natea@graphon.com
Fri, 16 Feb 2001 11:09:47 -0800


looks like we may of solved the problem. we switched configurations
of eth0 and eth1 on the other machine so that the sub interface was
that of eth0 and not eth1 and the system didn't lockup last night
and the system has not had a single error since.

so..what could this mean? 

tia.

nate

Nate Amsden wrote:
> 
> hi
> 
> after seeing this message posted by Antwerpen@netsquare.org:
> http://www.scyld.com/pipermail/eepro100/2001-February/001509.html
> 
> i figured i should post because i have a very similar problem.
> 
> We have 3 identical 1U systems running Supermicro S370SSE motherboards
> (at least im 99.99999% sure it is, i cant be 100% sure without taking
> the system apart). They have dual onboard Intel 82559 NICs.
> 
> (somewhat related..)
> When using OpenBSD 2.8 on one of them, the machine seemed to crash
> after about 5 minutes of use(firewaling/port forwarding under
> very low load maybe 10kb/s at best).
> 
> I have since replaced OpenBSD 2.8 with Debian GNU/Linux 2.2r2 and
> kernel 2.2.17+many patches including modules for eepro100 v1.11a.
> this machine has been operating perfectly for the past 68 days
> 20 hours.  At another location on the other side of the country
> we are trying to deploy the 2nd of 3 systems, using a similar
> configuration(kernel and modules are identical, bios settings
> match etc) and since we deployed it on monday i think it was
> it has consistantly locked up hard every night. Today we
> synched the bios settings between the unit here and there and
> things seemed to be going better however the errors are still
> showing up. something that has never shown up in the logs in
> the unit here.
> 
> sample log entry:
> 
> Feb 10 05:37:11 gate-nh kernel: eth1: Transmit timed out: status 0050  0080 at
> 59/61 commands 000c0000 400c0000 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1: Tx ring dump,  Tx queue 61 / 59:
> Feb 10 05:37:11 gate-nh kernel: eth1:   0 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   1 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   2 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   3 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   4 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   5 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   6 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   7 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   8 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   9 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   10 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   11 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   12 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   13 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   14 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   15 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   16 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   17 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   18 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   19 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   20 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   21 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   22 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   23 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   24 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   25 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   26 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1: * 27 000c0000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   28 400c0000.
> Feb 10 05:37:11 gate-nh kernel: eth1:  =29 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   30 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   31 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:Printing Rx ring (next to receive into
> 143).
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 0  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 1  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 2  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 3  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 4  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 5  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 6  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 7  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 8  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 9  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 10  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 11  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 12  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 13  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 14  c0000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 15  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 16  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 17  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 18  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 19  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 20  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 21  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 22  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 23  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 24  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 25  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 26  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 27  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 28  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 28  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 29  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 30  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 31  00000001.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 0 is 3100.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 1 is 782d.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 2 is 02a8.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 3 is 0320.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 4 is 05e1.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 5 is 0021.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 21 is 0000.
> Feb 10 05:37:11 gate-nh kernel: eth1: Tx ring dump,  Tx queue 61 / 59:
> Feb 10 05:37:11 gate-nh kernel: eth1:   0 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   1 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   2 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   3 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   4 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   5 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   6 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   7 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   8 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   9 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   10 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   11 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   12 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   13 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   14 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   15 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   16 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   17 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   18 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   19 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   20 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   21 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   22 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   23 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   24 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   25 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   26 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1: * 27 000c0000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   28 400c0000.
> Feb 10 05:37:11 gate-nh kernel: eth1:  =29 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   30 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:   31 000ca000.
> Feb 10 05:37:11 gate-nh kernel: eth1:Printing Rx ring (next to receive into 143)
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 0  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 1  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 2  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 3  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 4  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 5  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 6  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 7  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 8  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 9  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 10  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 11  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 12  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 13  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 14  c0000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 15  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 16  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 17  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 18  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 19  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 20  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 21  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 22  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 23  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 24  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 25  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 26  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 27  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 28  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 29  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 30  00000001.
> Feb 10 05:37:11 gate-nh kernel:   Rx ring entry 31  00000001.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 0 is 3100.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 1 is 782d.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 2 is 02a8.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 3 is 0320.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 4 is 05e1.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 5 is 0021.
> Feb 10 05:37:11 gate-nh kernel:   PHY index 1 register 21 is 0000.
> 
> I'm not sure what kind of machine was replaced by this one but I
> could find out..it was a redhat machine and it ran for about the
> past year until we decided to replace it with a racked debian
> box. any idea what could cause this? It only happened once we
> started using the new system. And I bet the OpenBSD crashes
> on my end here were the result of something similar. however,
> in OpenBSD it didn't give any errors, it just dumped to the
> debugger and sat there until i rebooted it. buggy chip?
> buggy driver? hard to imagine the driver is to blame as
> this other system has been running for over 2 months without
> a single problem.
> 
> running ifconfig on both systems shows:
> (on broken system)
>   4:27pm  up  5:48,  1 user,  load average: 0.00, 0.00, 0.00
> eth0      Link encap:Ethernet  HWaddr 00:30:48:11:02:D8
>           inet addr:192.168.100.2  Bcast:192.168.100.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:417948 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:184575 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:100
>           Interrupt:11 Base address:0xb000
> 
> eth1      Link encap:Ethernet  HWaddr 00:30:48:11:12:16
>           inet addr:XX.XX.XX.XX  Bcast:XX.255.255.255  Mask:255.255.255.XXX
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:207215 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:192204 errors:2 dropped:0 overruns:0 carrier:0
>           collisions:157 txqueuelen:100
>           Interrupt:5 Base address:0xd000
> 
> eth1:0    Link encap:Ethernet  HWaddr 00:30:48:11:12:16
>           inet addr:XX.XX.XX.XXX  Bcast:XX.255.255.255  Mask:255.255.255.XXX
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           Interrupt:5 Base address:0xd000
> 
> (on working system)
>   1:24pm  up 68 days, 20:42,  1 user,  load average: 0.00, 0.04, 0.06
> eth0      Link encap:Ethernet  HWaddr 00:30:48:11:02:D9
>           inet addr:XX.XX.XX.XX  Bcast:XX.XX.XXX.XXX  Mask:255.255.255.XXX
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:101475510 errors:0 dropped:0 overruns:0 frame:1
>           TX packets:117209873 errors:0 dropped:0 overruns:0 carrier:116
>           collisions:15698167 txqueuelen:100
>           Interrupt:11 Base address:0x9000
> 
> eth0:1    Link encap:Ethernet  HWaddr 00:30:48:11:02:D9
>           inet addr:XX.XX.XX.XXX  Bcast:XX.XX.XX.255  Mask:255.255.255.XXX
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           Interrupt:11 Base address:0x9000
> 
> eth1      Link encap:Ethernet  HWaddr 00:30:48:11:12:17
>           inet addr:192.168.50.20  Bcast:192.168.50.255  Mask:255.255.255.XX
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:123049598 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:104000322 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:100
>           Interrupt:5 Base address:0xb000
> 
> I point this out because eth1 on the broken system had 2 TX errors,
> and after all the weeks and packets that have gone through the working
> system not a single error. although a lot of collisions, but it is
> hooked up to a $10 hub...
> 
> here is the kernel log for the broken system when the kernel loaded
> the driver:
> 
> Feb 15 02:38:45 gate-nh kernel: eepro100.c:v1.11a 7/31/2000 Donald Becker
> <becker@scyld.com>
> Feb 15 02:38:45 gate-nh kernel:   http://www.scyld.com/network/eepro100.html
> Feb 15 02:38:45 gate-nh kernel: eth0: OEM i82557/i82558 10/100 Ethernet at
> 0xc808b000, 00:30:48:11:02:D8, IRQ 11.
> Feb 15 02:38:45 gate-nh kernel:   Receiver lock-up bug exists -- enabling
> work-around.
> Feb 15 02:38:45 gate-nh kernel:   Board assembly 000000-000, Physical connectors
> present: RJ45
> Feb 15 02:38:45 gate-nh kernel:   Primary interface chip i82555 PHY #1.
> Feb 15 02:38:45 gate-nh kernel:   General self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   Serial sub-system self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   Serial sub-system self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   Internal registers self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   ROM checksum self-test: passed (0x04f4518b).
> Feb 15 02:38:45 gate-nh kernel: eth1: OEM i82557/i82558 10/100 Ethernet at
> 0xc808d000, 00:30:48:11:12:16, IRQ 5.
> Feb 15 02:38:45 gate-nh kernel:   Receiver lock-up bug exists -- enabling
> work-around.
> Feb 15 02:38:45 gate-nh kernel:   Board assembly a19716-001, Physical connectors
> present: RJ45
> Feb 15 02:38:45 gate-nh kernel:   Primary interface chip i82555 PHY #1.
> Feb 15 02:38:45 gate-nh kernel:   General self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   Serial sub-system self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   Internal registers self-test: passed.
> Feb 15 02:38:45 gate-nh kernel:   ROM checksum self-test: passed (0x04f4518b).
> 
> I imagine the same is similar for the working system however the bootup
> logs are cycled and overwritten after a month of uptime.
> 
> network load on both systems is extremely light, MRTG reports over
> the past 5 weeks average network traffic 2.9kB/s both ways for the
> broken system. the working one averages 13-14kB/s both ways for
> the past 5 weeks. both systems are on 1Mbit dsl connections.
> 
> the 3rd is sitting on a shelf waiting for someone to get the time
> to set it up. its in another state so i don't have access to it.
> 
> The machines themselves are Single P3-733Mhz 128MB ram, using
> that Supermicro motherboard, a single 20GB quantum IDE drive.
> 
> any ideas would be appreciated :) i have a feeling it will
> lockup again tonight.
> 
> thanks!
> 
> nate
> 
> --
> Nate Amsden
> System Administrator
> GraphOn
> http://www.graphon.com
> 
> _______________________________________________
> eepro100 mailing list
> eepro100@scyld.com
> http://www.scyld.com/mailman/listinfo/eepro100

-- 
Nate Amsden
System Administrator
GraphOn
http://www.graphon.com