"Transmit timed out" with EtherExpress Pro100B

Robert G. Brown rgb@phy.duke.edu
Tue Oct 6 11:02:24 1998


On 6 Oct 1998, Osma Ahvenlampi wrote:

> Interesting. I benchmarked the machine against another with a 3c905
> using a crossover wire, with the result of 94 Mbps. I agree there's
> nothing wrong with the speed of the card (actually, I was surprised
> the 3com was that fast).

UDP, not TCP.  I think TCP would come in around 94, by the time you
factor in the larger headers.  Don't remember.

> > headers.  This is going both ways at once, so the aggregate bandwidth
> > was 191 Mbps.  The network is stable enough that several of the systems
> > were up for months diskless (while awaiting a functioning aic7xxx).
> 
> In a network that has multicast traffic?

It is a switched network and there are certainly broadcasts but not a
ton of multicasts esp from e.g. Appletalk or NT -- these tend to be
isolated on VLAN.

> > The difficulties experienced on a poweredge 2300 could easily come from
> > a mixture of a kernel swap/mmap bug, the aic7xxx 7890 driver (which is
> > still pre-release, after all) and the eepro100.  I don't deny that there
> 
> This is true. Since I can't boot the machine without the aic7xxx
> pre-release driver (which is in its last beta before the real release,
> so if there is a bug in it, it should be pointed out ASAP), it's
> difficult for me to eliminate that variable. PowerEdge has the eepro
> and the aic7xxx on different interrupts, though..
> 
> Anyway, I just patched the eepro100 driver to
> multicast_filter_limit=0, re-enabled appletalk, and put two machines
> pinging the pe2300 constantly. If I don't see the problem within one
> hour, I would consider it pretty clear what's broken. With only
> surface knowledge of Linux internals and absolutely no knowledge of
> the Intel Speedo3 chip, I can't do anything about a fix, though.

Well, VLAN's are great if you can do it with your switch.  Isolate the
Macs off by themselves where the can chatter away.  Spurious network
traffic eats CPU in addition to bandwidth.  I think that some of the
problems may come from deeper in the kernel, though, and not just in the
card.  One reason I think this is that they are just (re)surfacing as a
new generation of very fast systems like the 2300's appears.  With U2W
SCSI controllers, fast ethernet controllers, and a heavy task mix you
are pushing some kernel latencies, possibly past some point of failure.

Has anybody tried the 2.0.36pre stuff?  Alan has released a bunch of
patches that are supposed to address some of the kernel bugs in 2.0.35;
perhaps one of them "fixes" this...

> If the other poster is right, and this problem can be demonstrated
> simply by running gated, that doesn't leave THAT many layers. Pretty
> much the minimum layers for a functioning IP stack, in fact.

That's a lot of room, if you think about it... especially if the problem
is load dependent.  Do you have an SMP system?  Do you think that this
could be an interrupt resolution issue?

> While I would personally like to see this work, and do something to
> reach that goal, I simply can not justify the hours spent in internal
> hacking instead of business. It's cheaper for us to buy a new card for
> the machine. I'm just afraid that next time I'm configuring a server,
> I'll have the same problem in front of me.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu