trouble with mismatched network
William J. Earl
wje@cthulhu.engr.sgi.com
Mon Oct 25 19:06:32 1999
Donald Becker writes:
> On Mon, 25 Oct 1999, William J. Earl wrote:
>
> > > I have a linux box acting as a server and gateway between a Windows
> > > LAN and a Cisco 675 router to the internet through an adsl connection.
> ..
> > > having is that none of the Windows boxes can transfer files larger
> > > than about 10k to the linux box before timing out. The linux box can
> > > transfer files to the windows boxes no problem.
> ..
> > This sounds like the problem I was having, until I applied the
> > patch I sent out to the list a few months ago, based on a workaround
> > in the BSD driver for a 82c168 bug. The part sometimes prefixes an
>
> The word from LiteOn is that the BSD "fix" is only needed because they are
> using the chip is the chained descriptor mode, which isn't supported by the
> PNIC. The Linux driver uses the chip in descriptor ring mode (and also
> chains the descriptors for work-alikes that don't use ring mode).
Without the workaround, my card was seeing many corrupted packets.
With it, I don't. This suggests that LiteOn are mistaken.
> > incoming packet with a lot of garbage, which leads to the packet
> > being dropped. The workaround recovers the packet by deleting the
> > garbage. If the patch is not in the latest driver, try applying the
> > patch. (Find the patch in the archives.)
>
> The work-around is high overhead and an ugly hack.
>
> [[ The proper fix, using ring mode, requires complex code to work with the
> BSD list-of-mbufs. We receive into a single linear buffer, which permits
> simpler buffer management and more efficient PCI bursts. ]]
The overhead is only high if you actually get corrupted packets,
aside from the overhead of clearing every packet buffer, and that overhead
is only present if the chip is one of the problem revisions. As near as
I can tell, in the failure case, the packet, prefixed with garbage,
is spread across multiple buffers in the ring, and will be split
across two buffers some of the time. I don't see how the
single linear buffer affects matters, since the failure case effectively
increases the size of the packet beyond the nominal limit, and hence,
in general, beyond the size of the linear buffer.
What do you believe to be the proper fix? I would be happy to
implement an alternative, and determine if it works.
> > Someone suggested turning off CPU-to-PCI write buffering in
> > the BIOS setup. This helped somewhat in my case.
> >
> > Also, if you get messages about "too much work at interrupt level",
> > you can add
> >
> > options tulip max_interrupt_work=75
> > This helped my situation.
>
> Hmmm, is your machine slow or very overloaded?
> Make certain it isn't going into power-save mode and slowing down the CPU!
No, it is a K6-2 450 MHZ, and I don't have power-saving mode on.
The typical failure case was when doing a backup of the entire system
using a Networker server. The traffic consists of a maximum level of
maximum size data packets being transmitted, and a roughly comparable
number of much smaller packets (ACKs and other overhead) being received.
I was only getting a small number of failures per day, but that was enough
to cause serious problems, since the drivers seems to not recover cleanly
when the limit is exceeded.
Are you sure that the limit cannot be exceeded by extra packets
arriving while you are processing previous packets, particularly if the
packets are small?