3c905b & very high RX-ERR rates

esiewick@digipro.com esiewick@digipro.com
Wed Apr 21 11:28:30 1999


Hi.

Is the 3c905b rev 4 hyper-sensitive to media faults?  I'm beginning to think 
I'm looking at a timing issue in the driver vs other 100-Tx cards/drivers.
Is there a compile time tweak for this?

I'm trying to isolate a problem causing very high RX-ERR rates (about
1.5%) on a small 100-TX network.  RX-ERR problems usually suggest problems
on the network, not on the card or its driver.  I'm posting this to the
linux-vortex-bug list because it's only showing up on workstations with
3c905b (rev 4) cards (And because I'm now thrashing around for any and all
hints). The same hub ports and TP cables work just fine with a workstation
with an embedded Intel EtherExpress Pro NIC.  And swapping in other 3c905b 
(rev 4) cards doesn't help.

The workstations I'm futzing around with, trying to find a working
solution, are Micronix M54Hi Pentium 200MHz running redhat 5.2, kernel
2.2.6 w/ 3c59x.c 0.99H NIC is a 3c905b revision 4. The workstations
involved are running:

redhat 5.2, kernel 2.0.36 w/ 3c59x.c 0.99E (the E did work), and
redhat 5.2, kernel 2.0.36 w/ 3c59x.c 0.99H (same machine as above)
redhat 5.2, kernel 2.2.6  w/ 3c59x.c 0.99H
redhat 5.2, kernel 2.2.6 (SMP) w/ embedded Intel EE Pro.

Curious bit 1:  setting vortex_debug = 6 actually helps the situation --
error rate drops to about .1%. [Does the 3c59x.c code needs air brakes?] Is
there a known tweak for this problem?

Curious bit 2: using a large FTP transfer between workstations, either 
direction, I see the following RX-ERR problems:

                  FROM:
                  2.2.6/0.99H  2.0.36/0.99E  2.0.36/0.99H  2.2.6/EE Pro

TO: 2.2.6/0.99H   unknown      FR: high      TO: high      TO: high
    2.0.36/0.99E  TO: high     unknown       unknown       TO: high
    2.0.36/0.99H  TO: high     unknown       unknown       TO: high
    2.2.6/EE Pro  FR: high     FR: high      OK            unknown

[I am not prepared to upgrade the 2.0.36 workstation to 2.2.6, although
the results of testing between like machines would be interesting.]

I *think* the current RX-ERR problems popped up after upgrading the
kernels for these workstations, though I have to admit nobody can
pin-point the week (or even the month) when the problems started.

The cable runs are short, each being only 1 meter.  I've tried 3 meter
cables (same vendor) with the same results.

Full- vs. half-duplex doesn't seem to have an effect on this issue.
Autonegotiation vs insmod with various options doesn't change things.  Hub
is 100-Tx only 3com TP800, so I can't test at 10-Tx.

Debug output to syslogd gets busted up, the beginnings of most lines don't
make it:

Apr 20 22:38:01 walker kernel: loop, status e201.
Apr 20 22:38:11 walker kernel: oop, status e201.
Apr 20 22:38:41 walker kernel: ying to send a packet, Tx index 89712.
Apr 20 22:40:21 walker kernel:  status e000.
Apr 20 22:41:26 walker kernel: 0: Trying to send a packet, Tx index
129489.
Apr 20 22:42:16 walker kernel:  In interrupt loop, status e201.
Apr 20 22:44:38 walker kernel: us 60008042.
Apr 20 22:49:13 walker kernel: 222916.
Apr 20 22:50:18 walker kernel:  a packet, Tx index 238462.
Apr 20 22:50:58 walker kernel: cy 6 ticks.
Apr 20 22:51:21 walker kernel: tatus e201.
Apr 20 22:53:03 walker kernel: h0: interrupt, status e201, latency 2
ticks.
Apr 20 22:53:28 walker kernel: cket, Tx index 284227.
Apr 20 22:53:43 walker kernel: 01, latency 2 ticks.
Apr 20 22:54:21 walker kernel: cy 2 ticks.
Apr 20 22:57:01 walker kernel: 7a.
Apr 20 22:57:27 walker kernel:  007a.



vortex-diag.c:v1.07 11/24/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Found a 3Com PCI Ethernet 3c905b rev 4 at 0xfc00.
The Vortex chip may be active, so FIFO registers will not be read.
To see all register values use the '-f' flag.
Initial window 7, registers values by window:
  Window 0: 0000 0000 0000 0000 f5f5 0000 0000 0000.
  Window 1: FIFO FIFO 0000 0000 0000 0000 0000 2000.
  Window 2: 1000 6f4b 75fc 0000 0000 0000 000a 4000.
  Window 3: 0000 0140 05ea 0020 000a 0800 0800 6000.
  Window 4: 0000 0000 0000 0cd2 0000 8880 0000 8000.
  Window 5: 1ffc 0000 0000 0600 0807 069e 06c6 a000.
  Window 6: 0000 0000 0000 0000 0000 0000 0000 c000.
  Window 7: 0000 0000 0000 0000 0000 0000 0000 e000.
Vortex chip registers at 0xfc00
  0xFC10: **FIFO** 00000000 0000000a *STATUS*
  0xFC20: 00000020 00000000 00080000 00000004
  0xFC30: 00000000 8f887078 01a931a0 00080004
 Indication enable is 06c6, interrupt enable is 069e.
 No interrupt sources are pending.
 Transceiver/media interfaces available:  100baseTx 10baseT.
 MAC settings: full-duplex.
 Station address set to 00:10:4b:6f:fc:75.
 Configuration options 4000.

Something doesn't mesh between these cards and vortex-diag, though I've
seen similar results with a 3c509.

./vortex-diag -e
vortex-diag.c:v1.07 11/24/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
Found a 3Com PCI Ethernet 3c905b rev 4 at 0xfc00.
Parsing the EEPROM of a 3Com Vortex/Boomerang:
 3Com Node Address 00:10:00:10:00:10 (used as a unique ID only).
 OEM Station address 00:10:00:10:00:10 (used as the ethernet address).
 Manufacture date (MM/DD/YY) 0/16/0, division , product .
Options: none.
  Vortex format checksum is incorrect (00 vs. 10).
 Cyclone format checksum is incorrect (00 vs. 10).


Any and all clues or hints welcome.

Edward Siewick
--
  ESiewick@DigiPro.com               DigiPro Digital Productions, LLC
  Voice:  703-522-8465                   3100 North Quincy Street
  Fax:    703-522-8417                  Arlington, Virginia  22207