bug in speedo_tx_timeout

Yisrael yhersch@allot.com
Wed Feb 2 09:55:59 2000


Donald, Fred, and company,

After several days of experimenting and code changes, these are
my findings...

I don't need to write to the ISOLATE bit of the control register. I don't
see any difference in the behaviour of the NICs. Maybe something very
subtle is occuring that I'm not spotting.

However, I DO need to write to the [read only] status register. If I don't
the NIC eventually locks up.

Furthermore, it seems that you're right Fred, I too don't see a need
for waiting for the reset bit to clear. At least in my tests the last
few days, it seems to clear the first time, every time. Go figure. For
now, I'm leaving in the short loop I added, since I'm still suspicious
of the behaviour of the Intel NICs.

I would like to propose a minor change to the MII reset code. I am
currently using the following code...

01  /* Reset the MII transceiver, suggested by Fred Young @ scalable.com. */
02  /* with proposed changes */
03  if ((sp->phy[0] & 0x8000) == 0) {
04
05      int w;
06
07      int phy_addr = sp->phy[0] & 0x1f;
08      int advertising = mdio_read(ioaddr, phy_addr, 4);
09      int mii_bmcr = mdio_read(ioaddr, phy_addr, 0);
10
11  //  mdio_write(ioaddr, phy_addr, 0, 0x0400);
12      mdio_write(ioaddr, phy_addr, 1, 0x0000);
13      mdio_write(ioaddr, phy_addr, 4, 0x0000);
14      mdio_write(ioaddr, phy_addr, 0, 0x8000);
15  #ifdef honor_default_port
16      mdio_write(ioaddr, phy_addr, 0, mii_ctrl[dev->default_port & 7]);
17  #else
18      // wait for reset bit to clear
19      for(w=1000; w>0; w--) {
20          if( (mdio_read(ioaddr, phy_addr, 0) & 0x8000) == 0 )
21              break;
22      }
23      mdio_write(ioaddr, phy_addr, 0, mii_bmcr);
24      mdio_write(ioaddr, phy_addr, 4, advertising);
25  #endif
26  }

line 05 - used for a loop in lines 19-22

line 11 - I find that I don't need this write to the Isolate bit. Since
   I have this in my documentation as "reserved, should be zero", it
   makes me nervous to write to this bit. I may put this back in later.

line 12 - Fred, you're a genius. If I don't have this line of code in the
   driver, the NIC eventually rolls over and plays dead. How did you ever
   discover that writing to a read only register is required?

line 19 - Fred, although I think I agree with you that waiting for the
   bit to clear isn't necessary, from experience, if the datasheet says
   to do something, you'd better do it. So therefore, I've added this loop
   to wait for the bit to clear. Not only that, but as the CPUs get
   more powerful (1 GHz and higher), this wait might be needed. I imagine
   that a loop of a thousand is ridiculous overkill, but it probably
   doesn't matter one way or the other.

All in all, not much different than yours Fred. Just a minor adjustment.


===> Now for the biggie...

As I mentioned, we've been having problems with our NICs refusing to talk
to anyone else. Having a mother-in-law that refuses to talk is OK, however
a NIC that shuts up is sort of useless.

One thing that I noticed was that if the system came up without a cable
connected to the NIC, ifconfig indicated that the NIC was running (link
on), even though it wasn't even connected.

I decided to play around with the "sticky" link bit and guess what, it
really is sticky. Or rather, it seems that it's slippery. Below is the
relevant section of code from the routine speedo_tx_timeout...

01  /* We have MII and lost link beat. */
02  if ((sp->phy[0] & 0x8000) == 0) {
03      int partner = mdio_read(ioaddr, phy_num, 5);
04//    if (partner != sp->partner) {
05      if (partner != sp->partner || sp->partner == 0) {
06          int flow_ctrl = sp->advertising & partner & 0x0400 ? 1 : 0;
07          sp->partner = partner;
08          if (flow_ctrl != sp->flow_ctrl) {
09              sp->flow_ctrl = flow_ctrl;
10              sp->rx_mode = -1;	/* Trigger a reload. */
11          }
12
14          /* Clear sticky bit. */
15          mdio_read(ioaddr, phy_num, 1);
16          /* If link beat has returned... */
17          if (mdio_read(ioaddr, phy_num, 1) & 0x0004)
18              dev->flags |= IFF_RUNNING;
19          else
20              dev->flags &= ~IFF_RUNNING;
21      }
22  }

It seems that if register 1 is only read once, the bit always shows
as zero (does that mean it's slippery instead of sticky?), but if
it's read twice (lines 15 and 16), then it gives the proper value.

However, that's still not the biggie. Here's the biggie...

==> As far as I understand, line 04 checks if the NIC detected a
change in partner (square dancing anyone?). If the partner changed,
then we've got to reinitialize. But, it seems that line 04 doesn't
take into account the situation that our NIC might not have had a
partner. That is, the NIC at this point in time either had no
previous partner, or forgot about its previous relationship (burned
the pictures and tore up the theater stubs). I added the check (in
line 05) to see if our NIC has a partner or not. If the NIC has no
partner, the value is 0 (after all, without a partner, we're nothing).
Once I added this check, the driver works great.

>From what I've seen, under heavy loads the NIC transmit somehow craps
out. The tx timeout is invoked, and the driver tries to get things
going again. However, if the driver thinks that it didn't have a
previous partner, then things don't get initialized properly.

With the check for partner equal to zero (no previous partner), a
critical section of the recovery code is now executed allowing the
NIC to properly recover from the transmit timeout.

Whereas before we were experiencing transmit shutdowns as a result of
heavy traffic, the NIC now keeps working in spite of my beating on it
with heavy packet storms. I do see that from time to time the NIC
becomes quiet, but then it picks up again and continues to work.
Before, once the NIC became quiet, that was it, it remained quiet.


Thanks for listening, I hope this helps,

Yisrael (Russ) Hersch
Allot Communications

-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-eepro100-request@beowulf.org