[vortex-bug] tbusy and tx_full

Thu, 18 May 2000 12:05:09 -0400 (EDT)

On Thu, 18 May 2000, Bogdan Costescu wrote:
> On Wed, 17 May 2000, Donald Becker wrote:
> 
> > > vortex_interrupt (which might not be OK because it's async WRT
> > > start_xmit).
> > That's the point of the tx_full flag in many of the drivers.  See
> >     ftp://scyld.com/pub/network/pci-skeleton.c
> > You create a race condition if you clear dev->tbusy when the transmit path
> > thinks that it is locked against re-entry.
> 
> I agree, but why do you clear tbusy in vortex_interrupt? If tbusy is for
> locking only, it should be set at entry and cleared at exit by the routine
> which wants to be protected. You have there ...

The dev->tbusy flag isn't just for the driver to lock its transmit routine.
It is used to tell the software queue layer that the device cannot accept
another packet to transmit.  The driver should leave dev->tbusy set when it
cannot accept more packets, and clear it in the interrupt routine as Tx
space becomes available.

> >   vp->tx_full  &&  vp->cur_tx - dirty_tx < TX_QUEUE_LEN
> > Means the transmit queue was full, but is now accepting more entries.
> 
> I understand your logic, but why do you want to do it so complicated, with
> tbusy based on tx_full and indexes and then tx_full based on indexes ?

Because dev->tbusy is used by the queue layer, and the private lp->tx_full
is used to tell the interrupt handler that it is safe to clear dev->tbusy.

> > You get much better system performance if you put some hysteresis in the
> > check e.g.
> >   vp->tx_full  &&  vp->cur_tx - dirty_tx < TX_QUEUE_LEN - 4
> > This allows four packets at a time to be queued.
> 
> I assume that you're talking here about the interrupt. But I don't
> understand your logic here: if you have a full ring and you only free one
> DPD in the current interrupt, by having this hysteresis you don't clear
> tx_full and tbusy. Then, you cannot queue another packet if start_xmit is
> called; you have to wait to free 4 DPDs before being able to queue another
> packet. This will give you ability to queue in hops; why do you say this
> gives better system performance ?

Background:
Most PCI drivers have a Tx ring size 16, perhaps limited to a queue length
of 10 or 12.  The hysteresis puts the refill mark at 8 packets.

Triggering the BH to run is somewhat expensive.  With hysteresis you allow
the queue layer to do a chunk of work at once, rather than constantly
cycling from the interrupt handler to the queue layer refill.

Occasionally  (hmmm, three years ago -- time to re-test) I do tests to make
certain that the various driver constants are still good choices.  One test
I run is with the Tx packet queue length.  With 100Mbps Ethernet and typical
TCP/IP traffic, the driver sees diminishing returns once the driver queue
length reaches 5-7 entries.  More than 10 entries almost never increases
performance.

> > The vortex driver operates differently than many because it has evolved
> > around the unusual 3Com design.  The 3Com Vortex design was one of the
> > earliest PCI bus masters, and suffers from several quirks.
> 
> But that's no reason why the driver should remain cumbersome now that
> it supports so many generations of cards...

Yes, the current 3c59x interrupt handler has the very bad behavior of
doing the following while cleaning up the Tx queue
	if (inl(ioaddr + DownListPtr) == virt_to_bus(&vp->tx_ring[entry]))

This is an expensive PCI read for every packet free.  It mostly made sense
with the 3c905, but it's not needed at all with the 3c905B/C.

Donald Becker				becker@scyld.com
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403