'Me too' on the transmit timeout, also need 1.05 driver

Peter T. Breuer ptb@it.uc3m.es
Wed Oct 6 12:09:20 1999


"A month of sundays ago Simon Andrew Boggis wrote:"
> 
> I have been running a few machines with eepro100's (8559 chipset) for
> several months with no apparent problems.

I also have not many problems nowadays on most machines that use them
 ... running the standard 2.0.36 kernels with, it seems,
eepro100.c:v0.99B 4/7/98. The MTBF on my machine (a PII450) is longer
than that between power cuts (about 50 days). But I am limiting multicast
filtering AND I myself am on a 10BT net. 

That said, my machine is not unique.  We have several that are "exactly"
alike.  And there is one of the notionally "alike" machines that does
have problems with its eepro100 NIC.  I up/down the NIC every hour in
cron to forestall problems and I have an /sbin/request-route script that
tries to fix things if the routes get lost, which might indicate NIC
problems. If I don't do that it will stick in Tx timeouts every so
often.

I also have one P200 with the same NIC that also shows the same
symptoms (same kernel source, etc., but compiled and running scsi
instead of ide).

> The longest lived machine (a router with 6 eepro100's) has been up for 100 
> days under a reasonable (many Gb's per day) routing load on a 2.2.9 kernel 

This is higher loading than me. My machine logs about two million
packets sent and received per day.

> (eepro100.c: v1.06). Whilst I occasionally (every 2 weeks or so) get a message
> or two about a timeout the machine recovers by itself with no detectable glitch
> in its operation (or the network). I have set the multicast filter limit to 3
> as a precaution after reading the earlier discussions on this. I do use 
> multicast a little: for xntp, gated updates and for the CAP Universal 

We also are vulnerable to multicast emissions in the environment
(particularly when I hook into the mbone :-).

> Incidently, we use 3com 3300 switches and NuSwitch (now Extreme Networks, I 

We are also on 3com switched. I have several nets below me using the
3com 100BT duplex switches, but they are with 3c905s.

> believe), as it seems from some of the recent discussion that certain switches
> *may* a factor in some problems.

We certainly know of a problem with the universities powerhubs that was
bringing down our net at the beginning of last year. Seems the spanning
tree packet emitted by the (otherwise inactive)  bridging code in our
kernels triggered a fault in these hubs to repeat and reflect the
emission continuously.  When several NICs on the old 3c509 net were in
promiscuous mode, a sort of resonance resulted, flooding the network
with small packets and taking out the buffers in the listening cards.
We can still reproduce it at will.  A critical mass of cards in
promiscuous mode, a few more machines being booted for the first time,
and bang. Horrible to see, and took a long while to understand and
trace.

The powerhubs firmware fault was activated, apparently, when the network
managers discovered they needed to pass some windows editing protocol
transparantly for the secretaries. I know no more. We found out by
checking the config of the powerhubs against the config a year before,
but I forget the precise thing.


Peter