[vortex] Re: 3com 3c905c-txm

Jeff Garzik jgarzik@mandrakesoft.com
Sun, 14 May 2000 18:40:24 -0400


Donald Becker wrote:
> On Sat, 13 May 2000, Jeff Garzik wrote:
> > pathetic when you do this weekly if not daily on public mailing lists.
> > You're a CTO?  Really?

> [[ Wow.  Most people wouldn't be so foolish as to get into a
> titles/credentials/experience/been-there-done-that challenge with me. ]]

It was an expression of surprise that a CTO with your credentials would
stoop to harping over the 2.3.x net driver situation on public mailing
lists:  three times in two days, in three different threads, at the time
I responded.

Donald, you are without question one of the sharpest people I've ever
met when it comes to x86 hardware and network hardware.  You also
without question have one of the biggest egos in the open source
community, which makes you incredibly hard to work with.

You have to realize that you cannot make all the design decisions for
the kernel.
You have to realize that sometimes you are wrong.
And you also have to realize that even if bad design decisions are made,
your actions and attitude are totally the wrong way to go about
reversing those decisions.


> > Informed people realize that:
> > * You are unable to maintain all the network drivers and keep up with
> > all network driver issues on a timely basis.  Simply unscalable.
> > * The nature of the Linux kernel cannot wait for you to update your
> > network drivers.
> 
> These two issues are tied together.  You are suggesting (and you are known
> to follow your own suggestion) shotgun approach of frequently making
> changes, and hoping one of the them will fix the problem.  People see
> change, and assume its progress.  Instead we end up with a series of broken
> drivers, each with a different set of bugs.

Sorry, you yourself have proven that the incremental approach is
superior to the "one big update with tons of changes" approach.  I won't
bother referencing examples from Linus and others.

You are also ignoring a concrete and real example:  the softnet
changes.  These were changes during a development cycle which required
updating network drivers.  Since you decry driver updates from others,
the only other possible solution for end users during an interface
change like that is to wait for you to update and test all the net
drivers.  Since that testing could take many months, users are left with
broken network drivers for many months (without independent updates).

To avoid having users with broken net drivers, and to avoid others
updating your own network drivers, the only solution to ever updating
the net driver API is to wait on you for all design decisions and
network driver implementations.  i.e. you control more and more of the
process.

Open source is about giving up control.  Letting others wisdom and
experience make your code even better.  Letting others test your
development-quality code.  Once of the best things about the open source
world and the Linux kernel is the quality and quantity of feedback.  You
just can't match that in private testing and designs of your own making.


> > * You are being a control freak if you (a) do not want the networking
> > API to change at all, or
> 
> I want the API to change, but only in carefully considered ways.
> 
> Interface changes are *expensive*.  Well, they are not expensive to you,
> since you make them without testing the results.  But every interface change
> takes days or weeks of my time to test the driver updates.

Interface changes are expensive, but since we are in a development
kernel series, it is openly stated that interfaces can and will change. 
The only holy grail is userland binary compatibility.

At some point a reluctance to change means valid and necessary progress
is impeded.


> > (b) want all users of development kernels to
> > wait for you to update your network drivers.
> 
> The control issue was very real.  I wanted the driver development to
> continue as it had for years:
>     based on driver-specific mailing lists, and
>     using drivers backwards compatible with stable kernels.
> This was a scalable model that reduced the amount of interaction required
> between developers.

It now seems apparent that "reduced amount of interaction" translates to
"I want to control the API and the drivers."

It is also clear that a reduced amount of interaction between yourself
and other kernel developers has led to a situation where you do not
fully understand the 2.3.x kernel API.


> Linus wanted to pull *all* development, including drivers, into the big
> kernel tree.  Updated drivers would occasionally be back-ported to older
> kernels.

Not necessarily all drivers, AFAICS, just core drivers people would need
to boot and do useful stuff, like say use the network for instance :)

I don't think the user community would like it if eepro100, tulip, and
all the other net drivers you develop suddenly disappeared from the
kernel.  People vote with their feet:  what gets the more use, the
drivers from your Website, or the kernel drivers?


> I felt that this would result in unstable kernel development, and
> increasingly long kernel development cycles.  Linus wanted to release the
> 2.4 before the end of 1999, and felt that developers wouldn't focus on that
> goal unless cross-kernel compatibility was removed.  A "burn the boats"
> approach to overcome the inherently exponential growth to unified,
> centralized development.

Entropy is always a problem.  Your solution seems to be hiding entropy
with compatibility cruft (until the system fails completely and utterly,
or gets completely redesigned).


> A curious thing about observing a noisy system: by the time you notice that
> some component has exponential growth, it's already too late to do anything
> about it.  When is 2.4 supposed to be released?

Talk to the Linux kernel marketing manager... :)


> > * You sometimes ignore obvious bug fixes and clear bug reports (even
> > from Linus or Alan).
> 
> There were items submitted as bug fixes that didn't actually fix anything.
> If there isn't a plausible mechanism, you haven't found causality.

There is a "right" bug fix and a pragmatic bug fix.  The pragmatic bug
fix is one that keeps your system working and not crashing until the
right bug fix is available.  No need to limp along when you can jog.


> > I would be PERFECTLY HAPPY to leave kernel net drivers completely alone
> > -- gives me more time for other kernel hacking.
> 
> Now that they are modified, and you have found out how difficult they are to
> get right for all cards on all machines, you are happy to let me be a
> maintenance programmer for the now-broken code?  How very generous.

Oh good grief.  First, your exaggeration of "brokenness" masks the fact
that many of the 2.3.x net drivers contain important bug fixes and
changes as well.

More importantly, your statement shows your ignorance of how kernel
development works.  If you are maintaining the driver, you can do
whatever you damn well please with it, including

	cp ~becker/tulip.c drivers/net #overwrite 2.3.x driver

IMHO the main issue that keeps your current net drivers out of the 2.3.x
kernel is your PCI scan infrastructure.  All of the other differences
are generally small potatoes, fixing a kmalloc without kfree, using
resource allocation correctly, etc.

(of course a side issue when maintaining drivers is to make sure you
don't break the driver used on any of Linus' computers :))


> >  The sad fact is, if I,
> > and Andrew, and Andrey, and others quit maintaining the drivers, there
> > will be no one maintaining the kernel net drivers.  You certainly aren't
> > doing anything to improve the kernel net drivers, and haven't for a long
> > time now (longer than I've been hacking on the net drivers anyway).
> 
> That's obviously false.  Snippets of my updated drivers are frequently put
> into the kernel.

Yeah, often by me.

I think it's funny how you claim credit for contributions of your code
made to the kernel, and then turn around and complain about it.


> > Where are your patches to Linus, Donald?
> 
> http://www.scyld.com/network/index.html
>   ftp://scyld.com/pub/network/*
> 
> No, they don't include the various changes inserted by others in the various
> 2.3 drivers thrashings.

Take my question literally.

One contributes to the Linux kernel by e-mailing patches to Linus.  Are
you planning to do this for any of the network drivers?


> >  We all know from your harping
> > of two attempts at pushing a gargantuan, under-discussed, and buggy
> > patch through.
> 
> You mean the pci-scan code.  That was working, tested code.

Fact:  it worked
Fact:  it was tested

Fact:  it has (present tense) known bugs
Fact:  it duplicated or circumvented existing kernel infrastructure


> At the time I
> had written Linux drivers that support more type of PCI-hot-swap/CardBus
> card type than anyone else.  I believe it was more card types than everyone
> else combined.  Linus apparently didn't like that the code included
> backwards compatibility, when he wanted to focus on 2.3->2.4.

Current 2.3.x kernel drivers can use a compatibility layer to work on
older kernels.


> >  Why didn't you want to work with Linus and Alan to get
> > the issues resolved?  I wasn't around then, so you can't blame me this
> > time :)
> 
> Yes, Jeff, you have won.  Linus does have the decision making power here.
> When I state that a patch set to my code is flawed, and Linus puts it in
> anyway, he is making a decision.  The only power I have to decide if I will
> implicitly endorse those changes, or not work with the modified versions.
> In this case the kernel direction, taken as a whole, was technically
> ill-considered enough that I felt it was untenable.

No, entropy is winning and I (and others) are just trying to keep up.

So Linus made a decision you don't like.  It happens to us all :)  But
you are currently pulling hard against a door marked "pull."  But
currently the infrastructures and changes have been tacitly if not
explicitly endorsed by everyone but you.  We would all welcome a
technical discussion of the points you dislike about the current PCI and
softnet infrastructures, instead of constant complaining about a
situation you're not trying to change...

	Jeff




-- 
Jeff Garzik              | Liberty is always dangerous, but
Building 1024            | it is the safest thing we have.
MandrakeSoft, Inc.       |      -- Harry Emerson Fosdick