[vortex] Re: Case 254686: GX150 autonegotiate problem

Bill Cattey wdc@MIT.EDU
Tue Jan 29 19:47:02 2002


Summary:

I read a fair bit of code and am discovering this is perhaps a more
complex problem than it seemed at first.

It would be a BIG help if someone (MIT NetOps?) could help us figure out
a way to remotely identify systems that are having a problem. It would
greatly help in scoping the problem, and in targeting relief while we
search for a long term solution.

Detail:

Since others are focused on other work, I have attempted to get
more understanding of the network performance problem manifested
by some of our GX150's.

A real obstacle to moving forward is the divergence between the Becker
work and the Red Hat work.  Becker responds more quickly and more
usefully than Red Hat, but we really want to avoid tying ourselves to a
non-standard driver for our multi-year deployment cycles.

Becker's original suggestion was to try his current alternate driver. 
Meanwhile I'd asked Red Hat to look into merging in functionality from
Becker, on the assumption that the divergence was simple and obvious.

I took it upon myself to read code and have some apparently depressing
findings to report:

The simple symptom Becker noticed about bogus tranceiver status was
handled in the RedHat revision LK1.1.12 1 Jan 2001 andrewm (2.4.0-pre1):

    /*
    * For the 3c905CX we look at index 24 first, because it bogusly
    * reports an external PHY at all indices
    */

Eventually, someone at RedHat will report to me what I infer from
another  comment in current rev of Red Hat 3c59x.c: 

       LK1.1.16 18 July 2001 akpm
        - Don't reset the interface logic on open/close/rmmod.  It upsets
          autonegotiation, and hence DHCP (from 0.99T).

that they are in synch with Becker's code as of 0.99T (The current
revision is, 0.99U. I've looked at  and the experimental version 0.99V
and it's not different from 0.99U in the autonegotiate code.)

I think a fair thing to do before we expect more help from Becker is to
try his 0.99U driver.  Since the obvious symptom seems resolved, we get
back to an earlier observation he and I shared:

Excerpts from pm.linux: 10-Jan-102 Re: [3c509] Question about .. Donald
Becker@scyld.com (2422*)

> > (Mind you, the next line in dmesg output said:
> >   ***WARNING*** No MII transceivers found!

> That's a bad message -- there could be a timing issue at work here.

Alas, there is no athinfo remote query to see dmesg, so I guess someone
will have to visit machines and find out if that message correlates with
the performance problem.

It does indeed seem like we are onto a timing problem.  Even if Becker's
alternate driver makes the problem go away, it doesn't say what change
needs to be made to the standard Red Hat driver to solve the problem. 
Instead it would lock us into a single-file, single-platform, hack we
have to preserve across every Red Hat update we take for the next 4
years until the GX150's are all gone.

So everyone I've talked to is very busy does not seem able to take
ownership of solving this problem.  We need to work together to better
understand what exactly is going on.  The problem is NOT a simple "get
Red Hat to update the driver" issue.

We probably need a better handle on reproducing the problem so we can
make a case for Red Hat to put energy behind working on it.

I'm copying the vortex driver development list with this note to keep
the development community in the loop, and in the hopes that I've missed
something simple and obvious that can lead to a quick and easy solution.

-wdc