3com 3c905c-txm

Donald Becker becker@scyld.com
Sat May 13 16:37:06 2000


On Sat, 13 May 2000, Jeff Garzik wrote:
> Donald Becker wrote:
> > As I mentioned in the previous reply to the linux-vortex mailing list, the 
> > 3c59x.c code in 2.3.99 is completely bogus. 
> > You should ignore it completely and look at 
> >     http://www.scyld.com/network/vortex.html 
> >     ftp://www.scyld.com/network/3c59x.c 
> > as the working method for using the chip. 
> 
> > I shouldn't be pointing out specific changes that were bad, because people
> > will assume that the rest of the changes were OK.  Generally, if one set of
> > changes are badly designed, it means that all of the changes should be
> > reconsidered.
> 
> Since you are not doing anything to change the situation, continual
> complaints without action on your part are called whining.  It's REALLY

This is a specific technical issue:
I pointed out the *very bad* bug in the transmit code, and the code that
operates the chip correctly (the original code).
There are other problems, but this is the one that ended up on the kernel
list.

You didn't even respond to the technical issue at root, instead leaping to
attack me personally.

It took me about four hours to test that my driver did not have the bug
reported, and that the 3Com chips really did seem to be behaving as the
documentation said they should.  Only later did I find out that it was the
modifications in 2.3.99 causing the problem.  This whole issue has taken
about eight hours.

> pathetic when you do this weekly if not daily on public mailing lists. 
> You're a CTO?  Really?

"Ever since I left my job as a scientist at NASA-Goddard."

A brief review of my credentials, since that does reveal something about
peoples capabilities:

I started working in parallel/distributed computing in 1983, designing
and building performance monitoring boards for the MIT Concert
multiprocessor.  (With 64 68000 processors, it was one of the largest of the
era.) 

I started the Beowulf project about six years ago.  That research effort has
very successfully focused the larger communities effort on commodity-based,
high performance computing.  Along the way we have produced a large body of
widely used software, including most of the commonly used Linux network
drivers.

One aspect of putting together large, reliable clusters is that the system
must be fast and stable.  Networking, and specifically the drivers, is a
key element in utilizing systems originally designed for stand-alone use.

[[ Wow.  Most people wouldn't be so foolish as to get into a
titles/credentials/experience/been-there-done-that challenge with me. ]]

> Informed people realize that:
> * You are unable to maintain all the network drivers and keep up with
> all network driver issues on a timely basis.  Simply unscalable.
> * The nature of the Linux kernel cannot wait for you to update your
> network drivers.

These two issues are tied together.  You are suggesting (and you are known
to follow your own suggestion) shotgun approach of frequently making
changes, and hoping one of the them will fix the problem.  People see
change, and assume its progress.  Instead we end up with a series of broken
drivers, each with a different set of bugs.

Rationally you should be happier waiting for a single, tested fix rather
than spending even more time dealing with three flawed updates that break
other things along the way.  But in the "press release" view of the world,
multiple "fixes" (later found to be flawed) represent excellent
responsiveness.

Imagine having roughly every other update being usable.  You might believe
that's a workable number.
Now imagine having 2000 of those elements, or even just 10, that you must
assemble into a system.

> * You are being a control freak if you (a) do not want the networking
> API to change at all, or

I want the API to change, but only in carefully considered ways.

Interface changes are *expensive*.  Well, they are not expensive to you,
since you make them without testing the results.  But every interface change
takes days or weeks of my time to test the driver updates.

> (b) want all users of development kernels to
> wait for you to update your network drivers.

The control issue was very real.  I wanted the driver development to
continue as it had for years:
    based on driver-specific mailing lists, and
    using drivers backwards compatible with stable kernels.
This was a scalable model that reduced the amount of interaction required
between developers.

Linus wanted to pull *all* development, including drivers, into the big
kernel tree.  Updated drivers would occasionally be back-ported to older
kernels.

I felt that this would result in unstable kernel development, and
increasingly long kernel development cycles.  Linus wanted to release the
2.4 before the end of 1999, and felt that developers wouldn't focus on that
goal unless cross-kernel compatibility was removed.  A "burn the boats"
approach to overcome the inherently exponential growth to unified,
centralized development.

A curious thing about observing a noisy system: by the time you notice that
some component has exponential growth, it's already too late to do anything
about it.  When is 2.4 supposed to be released?

> * You sometimes ignore obvious bug fixes and clear bug reports (even
> from Linus or Alan).

There were items submitted as bug fixes that didn't actually fix anything.
If there isn't a plausible mechanism, you haven't found causality.

It's frequently true that restarting the machine or loading a new driver
causes problems to go away without actually fixing anything.  I won't put in
changes just because they *might* fix a problem.

Unfortunately Linus doesn't always 

> I would be PERFECTLY HAPPY to leave kernel net drivers completely alone
> -- gives me more time for other kernel hacking.

Now that they are modified, and you have found out how difficult they are to
get right for all cards on all machines, you are happy to let me be a
maintenance programmer for the now-broken code?  How very generous.

>  The sad fact is, if I,
> and Andrew, and Andrey, and others quit maintaining the drivers, there
> will be no one maintaining the kernel net drivers.  You certainly aren't
> doing anything to improve the kernel net drivers, and haven't for a long
> time now (longer than I've been hacking on the net drivers anyway).

That's obviously false.  Snippets of my updated drivers are frequently put
into the kernel.

> Where are your patches to Linus, Donald?

http://www.scyld.com/network/index.html
  ftp://scyld.com/pub/network/*

No, they don't include the various changes inserted by others in the various
2.3 drivers thrashings.

>  We all know from your harping
> of two attempts at pushing a gargantuan, under-discussed, and buggy
> patch through.

You mean the pci-scan code.  That was working, tested code.  At the time I
had written Linux drivers that support more type of PCI-hot-swap/CardBus
card type than anyone else.  I believe it was more card types than everyone
else combined.  Linus apparently didn't like that the code included
backwards compatibility, when he wanted to focus on 2.3->2.4.



>  Why didn't you want to work with Linus and Alan to get
> the issues resolved?  I wasn't around then, so you can't blame me this
> time :)

Yes, Jeff, you have won.  Linus does have the decision making power here.
When I state that a patch set to my code is flawed, and Linus puts it in
anyway, he is making a decision.  The only power I have to decide if I will
implicitly endorse those changes, or not work with the modified versions.
In this case the kernel direction, taken as a whole, was technically
ill-considered enough that I felt it was untenable.

Donald Becker				becker@scyld.com
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403


-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-vortex-request@beowulf.org