[tulip] race condition leading to hung tx queue in .92, .93

Donald Becker becker@scyld.com
Tue Feb 19 12:15:01 2002


On Fri, 15 Feb 2002, Chris Friesen wrote:

> We have discovered a race condition that could lead to a hung tx queue
> in the .92 and .93 drivers.  Near the end of tulip_start_xmit(), there
> is the following code:
> 
>         if ( ! tp->tx_full)
>                 netif_unpause_tx_queue(dev);
>         else
>                 netif_stop_tx_queue(dev);

> The problem occurs if we fail the check and then before running the
> else clause get interrupted by tulip_interrupt(), which then cleans up
> enough send packets that it clears tx_full and tbusy.  The interrupt
> handler returns, and we proceed to set tbusy.  At this point we're
> left with tbusy set, and tx_full cleared, and the driver never
> recovers.

Yes.  The Tulip driver has a different structure than most of the other
PCI netdrivers, and thus the check for full->empty race that is
implemented in pci-skeleton.c did not apply.

> The fix is to change this code to the following:
> 
>         if ( ! tp->tx_full)
>                 netif_unpause_tx_queue(dev);
>         else {
>                 netif_stop_tx_queue(dev);
> 
>                 /* handle case of tulip_interrupt() running under our feet */
>                 if ( ! tp->tx_full)
>                         netif_start_tx_queue(dev);
>         }

Correct, although the preferred call is
   netif_resume_tx_queue(dev)
The 
   netif_start_tx_queue(dev);
call currently does the same thing, but is intended to be used when the
interface is first started.


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993