More Boomerang fixes

Rob Riggs rob@devilsthumb.com
Tue Jul 7 10:22:33 1998


This is a multi-part message in MIME format.
--------------7E8B8744F448795110F28949
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

After a decent night's sleep and a few helpful replies
to my last post, I've come up with a new and improved
Boomerang patch. This patch is against 3c59x.c version
.99E, as included with the 2.0.34 kernel.

This patch should fix some of the major problems that
many with 3c90x "boomerang" cards are seeing. In
particular, it should fix the "skput:over" kernel
panics that many, including myself, are experiencing.

This patch does four things. It adds a check for NULL
skb pointers, removes the 'rx_work_limit' variable and
broken (off by one) check for available ring buffer
skbs, logs a warning if the driver fails to allocate
an skb for the ring buffer, and fixes a problem with
the ring buffer counters overflowing.

The second and fourth items are probably the cause of
the major instability that most people have been having
with the boomerang cards. The first item removes the
need for the broken check, since if the skb pointer is
NULL, we've exceeded our "rx_work_limit". The third
item is for debugging purposes.

The "off by one" error can cause a call to skb_put
with a NULL skb pointer. When the buffer counter
overflows, the ring buffer will no longer be filled,
because the for() loop conditional used for refilling
the ring buffer will always fail, causing the a call
to skb_put with a NULL skb pointer once all of the skbs
are used. Both of these can (will?) cause a kernel panic.

As a point of reference, judging by current usage on
my 10BaseT network, it would take 200 days to cause
the buffer counters to overflow. On a similarly loaded
100BaseT nework it would take only 20 days.

The "off by one" error is most likely the cause of the
problem I have been having. The kernel panics only occur
during periods of high network activity (20GB nightly
backups accross the network -- a very busy network and
a very busy server). This makes sense, since it should
only be triggered when we have more than RX_RING_SIZE
frames to deal with at one time.

The patch I posted yesterday will fix all of the above
mentioned problems, but did introduce a bug that occurs
when the buffer counters overflow. It will cause the
use of a negative subscript to the ring buffer array,
which can cause kernel data corruption. (This can only
occur after the counters have exceeded INT_MAX). It also
introduced its own "off by one" error when dealing with
the counter overflow. Both are fixed in this patch.

Thanks to Mauricio Sanchez and Gilles Noyer for their
helpful responses.

-Rob

P.S. I have not had the ability to test this patch yet,
but it does compile cleanly. I won't be able to test it
until later this afternoon. You have been warned. :-)
--------------7E8B8744F448795110F28949
Content-Type: text/plain; charset=us-ascii; name="3c59x.patch2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="3c59x.patch2"

--- 3c59x.c.orig	Mon Jul  6 12:16:25 1998
+++ 3c59x.c	Tue Jul  7 11:11:52 1998
@@ -1755,7 +1755,6 @@
 	int entry = vp->cur_rx % RX_RING_SIZE;
 	int ioaddr = dev->base_addr;
 	int rx_status;
-	int rx_work_limit = vp->dirty_rx + RX_RING_SIZE - vp->cur_rx;
 
 	if (vortex_debug > 5)
 		printk(KERN_DEBUG "  In boomerang_rx(), status %4.4x, rx_status "
@@ -1801,6 +1800,8 @@
 				void *temp;
 				/* Pass up the skbuff already on the Rx ring. */
 				skb = vp->rx_skbuff[entry];
+				if (skb == NULL)		/* Ring buffer empty */
+					break;
 				vp->rx_skbuff[entry] = NULL;
 #if LINUX_VERSION_CODE >= 0x10300
 				temp = skb_put(skb, pkt_len);
@@ -1833,17 +1834,22 @@
 			vp->stats.rx_packets++;
 		}
 		entry = (++vp->cur_rx) % RX_RING_SIZE;
-		if (--rx_work_limit < 0)
-			break;
+		/* Wrap our counters -- otherwise the next for() loop fails */
+		if (vp->cur_rx == INT_MAX) {
+			vp->cur_rx = 0;
+			vp->dirty_rx -= INT_MAX;
+		}
 	}
 	/* Refill the Rx ring buffers. */
 	for (; vp->dirty_rx < vp->cur_rx; vp->dirty_rx++) {
 		struct sk_buff *skb;
-		entry = vp->dirty_rx % RX_RING_SIZE;
+		entry = (vp->dirty_rx < 0 ? vp->dirty_rx + INT_MAX : vp->dirty_rx) % RX_RING_SIZE;
 		if (vp->rx_skbuff[entry] == NULL) {
 			skb = DEV_ALLOC_SKB(PKT_BUF_SZ);
-			if (skb == NULL)
+			if (skb == NULL) {
+				printk(KERN_DEBUG "boomerang: Could not allocate skb on ring buffer.\n");
 				break;			/* Bad news!  */
+			}
 			skb->dev = dev;			/* Mark as being used by this device. */
 #if LINUX_VERSION_CODE > 0x10300
 			skb_reserve(skb, 2);	/* Align IP on 16 byte boundaries */

--------------7E8B8744F448795110F28949--