[eepro100] No Resources / RX Buffers bug

Derek Glidden dglidden@illusionary.com
Tue, 29 Aug 2000 13:41:18 -0400


Hello,
I've been reading through the list archives and scanning various
websites for information regarding this (apparently well-known) EEpro100
driver "issue."  We've been having lots of problems lately with this
driver in recent kernels using hardware that has proved to be extremely
stable in the past.  

We're consistently seeing lockups during boot on our machines from this
"Card reports no resources/Card reports no RX buffers" problem.  This
has been occuring in about 10% or so of the times we've booted machines
that have EEPro cards in them.  Quite a few of them have been working
consistently flawlessly for at least the last year or so without any
problems whatsoever until upgrading to newer kernels, so it seems to be
tied to more recent versions of the EEPro driver in recent versions of
the kernel.  I think we've started to see problems since about 2.2.14 or
so.  If the card actually initializes properly, the machine is extremely
stable; if it does not, the only way out is to Big Red Button the thing,
which can be extremely difficult when the machine is remotely colocated.

(Yeah, I know we probably shouldn't be upgrading software on hardware
that has worked flawlessly in the past.  Some of these machines we're
simply upgrading to newer releases for security reasons, others we're
rebuilding to serve other functions so we're re-installing while we
re-deploy and some _are_ altogether new machines, but using the same
revisions of hardware we've had good luck with in the past - when we
find hardware that works well, we tend to buy them in caseloads so we
have them when we need them.  In all cases, the machines need to be as
close to 100% reliable as possible and have been so in the past, which
is why we've been exclusively using EEPro cards.  We don't see that with
this card lockup problem anymore.)

I've personally seen this problem occur on at least five different
machines on many seperate occasions, at least three of which had been
hardware that had been working extremely stably for quite some time that
I was rebuilding for other purposes.  (And one of which is my personal
workstation which I tend to upgrade much more frequently than our
deployed servers.)  We've had problems with multi-card configurations,
where at different times each card has failed to properly initialize,
requiring a hard reset.  There may have been other problems on remote
sites that we were not able to directly see the cause.

I'm not sure what the possible "fix" to this problem is.  Here is what I
have concluded from the  information I've been able to find:

1) Downgrade any box to an older kernel that uses an older, more stable
version of the driver.  (We'd rather not, since we'd like to have
certain security fixes from later kernel releases.)

2) Install Donald Becker's 1.10 driver from www.scyld.com

3) Install Intel's driver from
http://support.intel.com/support/network/adapter/pro100/30504.htm

4) Wait for the next release of Andrey's driver where this bug will be
fixed

5) There is no bug, just occasional reports of this problem

Can anyone please clarify what's known and what's assumed about this
situation (i.e. is it a known bug?  Is it not a bug?  Is it maybe a bug
but only with certain hardware?  Is it only a bug for certain people
with bad karma?) and what the recommended course of action is to work
out this problem?

Thanks!

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
With Microsoft products, failure is not           Derek Glidden
an option - it's a standard component.      http://3dlinux.org/
Choose your life.  Choose your            http://www.tbcpc.org/
future.  Choose Linux.              http://www.illusionary.com/