[eepro100] Re: system hangs @ boot-time when bringing up eth0

Donald Becker becker@scyld.com
Fri, 8 Sep 2000 10:05:07 -0400 (EDT)


On Fri, 8 Sep 2000, Andrey Savochkin wrote:

> > I hate to be such a squeaky wheel on this issue, but it is literally
> > costing my company time and money everytime someone has to drive out to

> I don't consider the issue as so serious.

This *is* a serious issue.  This is a major released kernel version, with
many users relying on the capabilities and reliability of the kernel.

> Each time you upgrade the kernel you take some risk, so you must be prepared
> to revert to the previous kernel quickly.  Or you may ask what's the problem
> and be advised how to work-around it in a most convenient way or which driver
> versions to pick up.

There is always some risk, but we shouldn't carelessly make the risk higher.

Consider the kernel as a hundred subsystems, each of which is "working" or
"flaky".  Just one "flaky" makes the whole kernel unusable for serious
work.
  With "flaky" probability 0.05, we are unlikely to ever get a good kernel.
  With "flaky" probability 0.01 we have a chance to stabilize

The way to avoid this is to test the driver, and proposed driver fixes,
outside the main kernel development.  That means having a support web page,
setting up mailing lists, and having almost-released, beta, alpha and
targeted test versions.  Which is what I've been doing since 1995..

> Certainly, I regret that the driver has the defect and the inconvenience it
> caused, but the issue doesn't worth more than a short complain :-)

That's part of what you signed up for by splitting off your branch of driver
development.  I give you credit for not doing the usual "patch and run".
But avoiding introducing new bugs like this was why I had the multi-tiered
development structure for the individual drivers.  It was much more work for
me, but in the long run it's the rational only way to do driver development.

The problem was that from the end-user viewpoint (Linus) it appeared that
new versions came out only rarely.  He never saw, and never had to see, the
large numbers of test versions sent out to see if individual problems were
fixed, or the beta test versions to verify that the fixes worked for most
people.

> > more curious whatever the driver *was* doing still can't be done that
> > way, since it seemed to work.  

> I believe (however, not absolutely sure without the documentation) that the
> bug has existed in the driver for a very long time, a few years.
> The sporadic faults started to appear only recently because the operation
> timings were changed by innocent and unrelated changes.

If it's a timing problem, and it didn't show up with the previous timing,
was it a bug before?

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Beowulf-II Cluster Distribution
Annapolis MD 21403