[vortex] Connectivity Takes a Break

Montee, Josh JMontee@dmenet.com
Mon, 30 Jul 2001 17:12:59 -0400


Hello
I am the Linux system administrator for the company I work for, and I am
responsible for the availability of 3 very important servers that run our
network.  For almost a year now I've been intermittently encountering this
problem where for no apparent reason (incidences do not seem to be
correlated), the connectivity of our Linux machines seems to 'take a break'
so to speak.  Sometimes its just a few moments, sometimes a few minutes,
sometimes upwards of 10+ minutes where I'm forced to reboot the servers.
These servers are the backbone behind the whole company, so I can't really
be having them go down for breaks like this.  Here is my configuration:

RedHat 6.2 (Kernel 2.2.14-5.0)
Pentium II 300 CPU, 64MB RAM
Two 3Com 3C905B NICs  (3c59x.c:v0.99H)
Cirrus Logic Video Card
Western Digital 1.2 GB Hard Drive

I've got another server thats almost identical that has the same
intermittent problem.  There are over 300 client workstations here so I
thought perhaps the aging system had reached capacity and was just being
overloaded.  A few weeks ago I then upgraded the machine to:

RedHat 6.2 (Kernel 2.2.14-5.0)
AMD Thunderbird 850 MHz, 256 MB RAM
Microstar K7Pro Motherboard (I believe)
Two (different) 3Com 3C905B-TX NICs (3c59x.c:v0.99Qk)
Trident TG9660 Video Card
Western Digital 2.1 GB Hard Drive

Still had the same problem.  My company thinks that this fixed it, but today
I noticed the same problem occured for about 3 minutes.  Both of the NICs
are connected to a cascaded stack of 4 Nortel BayStack 450-24's each running
firmware 3.1.0.22.  I checked but /var/log/messages shows nothing occuring.
Typing dmesg at the command prompt gives (I cut and pasted relevant text):

3c59x.c:v0.99Qk 7/5/2000 Donald Becker, becker@scyld.com
  http://www.scyld.com/network/vortex.html
eth0: 3Com 3c905B Cyclone 100baseTx at 0xd800,  00:50:da:d6:74:18, IRQ 9
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  MII transceiver found at address 0, status 786d.
  Enabling bus-master transmits and whole-frame receives.
eth1: 3Com 3c905B Cyclone 100baseTx at 0xd400,  00:50:da:2e:1a:f3, IRQ 10
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  MII transceiver found at address 0, status 786d.
  Enabling bus-master transmits and whole-frame receives.
eth1: Setting half-duplex based on MII #24 link partner capability of 4081.
eth1: Transmit error, Tx status register 82.
eth1: Transmit error, Tx status register 82.
eth1: Transmit error, Tx status register 82.
...
...  I got this error 34 times, shortened for your convenience
...
eth1: Transmit error, Tx status register 82.
eth1: Setting full-duplex based on MII #24 link partner capability of 4101.
[root@ns1 /root]# 

I got this same transmit error problem on the server before I upgraded it
all.  All conclusive information I can find can only point me in the
direction of something to do with 10/100 full/half duplex mode negotiation.
Please, anyone, help me solve this problem.  I've tried two different
versions 3c59x.c and the problem still exists.  I've even run new CAT-5e
cable.  I've seen posts of people saying they have similar problems but I've
never seen an answer.  My company doesn't want to go out and buy a Cisco
router or change my servers into NT, and neither do I, but we can't keep
having this problem.  Its making me and Linux look really bad (they are
skeptics and Linux-illiterate).  I don't know what to say.  Please, can
anyone shed some light on this?

Thank you,
Sincerely,

Joshua Montee
----------------------
PC & Networking Engineer
Perfect Call Promotions and Inscentives, Inc.
2441 Bellevue Ave.
Daytona Beach, FL 32114
Phone: (386) 271-3100
Fax: (386) 271-3003
E-Mail: jmontee@dmenet.com