[eepro100] Re: system hangs @ boot-time when bringing up eth0

Derek Glidden dglidden@illusionary.com
Fri, 08 Sep 2000 00:31:12 -0400

> "Krawl, Roeland" wrote:
> I believe that the "No buffers" and "No resources" problem would get
> fixed a lot sooner if Intel would permit the free flow of information
> about controlling the 82557, 82558, and 82559. Perhaps the day will
> come when a programmer's model of the chip can be obtained without
> having to sign an NDA to obtain the super secret programmer's model.
> It would seem logical that Ethernet chip manufacturers would want to
> assist driver developers in any way they could to improve the
> robustness of drivers and to boost sales/popularity of their chips.

Logical yes, but since when does a marketing-driven company follow
logic?  :)

I hate to be such a squeaky wheel on this issue, but it is literally
costing my company time and money everytime someone has to drive out to
physically reset a remote machine that did not come up on boot because
the EEPro initialization has hung.  And probably is going to cost us
even more time and money either replacing a bunch of previously-stable
eepro cards with something else or trying to back-rev the kernels on all
our boxes to a known-stable version.

I have gotten notes back that this is a "known issue" but no reasonable
solution for it yet.  I can understand if Intel does not want to release
full specs for the card to resolve it entirely, but at the same time, I
don't understand why I've only recently begun seeing this problem.  And
I *know* that it is only recently cropped up in the "stable" 2.2 kernel
series because the frequency at which we see it *guarantees* that we
would have seen it *at least once* with earlier kernel/driver versions. 
So something has changed to a previously stable driver to make it
extremely unstable.  (And in the kernel "stable" branch no less...)  I'm
more curious whatever the driver *was* doing still can't be done that
way, since it seemed to work.  

(I do understand that sometimes something works that's just plain wrong,
as once happened with a PCMCIA NIC I had - it stopped working with a new
PCMCIA package and when I asked, David Miller told me that its voltage
was being incorrectly reported and while it worked 99% of the time,
there was a small chance that it could have fried the card by sending it
too much juice.  Fixing the driver to report the correct voltage broke
the rest of the driver somehow until the next rev. I don't see how there
could be an issue like this with the eepro driver, however.)

I'm not a bad programmer, but certainly not in the class of
driver/hardware hacking, otherwise I'd take a shot at working it out
> Can anyone say concretely that the "No buffers" and "No resources"
> problem occurs only with the 82559 chip?

I can say reasonably concretely that it does not.  I have run into that
bug occur several times just today on a machine with a card that reports
itself as:

cat /proc/pci
Bus  0, device  14, function  0:
  Ethernet controller: Intel 82557 (rev 8).
    Medium devsel.  Fast back-to-back capable.  IRQ 9.  Master Capable. 
Latency=64.  Min Gnt=8.Max Lat=56.
    Non-prefetchable 32 bit memory at 0xefffe000 [0xefffe000].
    I/O at 0xcc00 [0xcc01].
    Non-prefetchable 32 bit memory at 0xefe00000 [0xefe00000].

I say "reasonably" concretely because I'm assuming that the card/chipset
is reporting itself correctly in /proc/pci.

With Microsoft products, failure is not           Derek Glidden
an option - it's a standard component.      http://3dlinux.org/
Choose your life.  Choose your            http://www.tbcpc.org/
future.  Choose Linux.              http://www.illusionary.com/