[tulip-bug] patch avoids lockups under high load

Josip Loncaric josip@icase.edu
Fri, 02 Feb 2001 18:19:56 -0500


This is a multi-part message in MIME format.
--------------D96B7C3FC45C0A1E12BF396E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

When subjected to very heavy load, our FA310TX (PNIC) cards are prone to
protracted (even indefinite) lockups with tulip.c including the latest
v0.92t (1/15/2001).

The problem is that under heavy load the driver masks interrupting
sources and expects to receive TimerInt, which never arrives (the timer
is apparently nonfunctional on FA310TX).  Sometimes so many sources are
masked that no new interrupts can arrive and the card is completely
locked until network restart.  In my stress tests, this happens about
2-6 times per minute (i.e. with probability of about one in a million).

Attached is a simple patch which avoids this problem with high
probability.  The idea is acknowledge all interrupt sources as usual but
to avoid turning off interrupting sources unless work budget was
exceeded a couple of times in a row.  This still protects the system,
but drastically reduces the probability of a lockup.

We are now using patched tulip.c:v0.92t now and so far the lockup
problem seems to be gone.

Sincerely,
Josip


-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip@icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
--------------D96B7C3FC45C0A1E12BF396E
Content-Type: text/plain; charset=us-ascii;
 name="tulip.c-0.92t-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="tulip.c-0.92t-patch"

--- tulip.c-0.92t	Mon Jan 15 02:29:36 2001
+++ tulip.c	Fri Feb  2 17:39:50 2001
@@ -38,6 +38,7 @@
 
 /* Maximum events (Rx packets, etc.) to handle at each interrupt. */
 static int max_interrupt_work = 25;
+static int HiLoadCount = 0; 	/* number of times in a row when max_interrupt_work exceeded */
 
 #define MAX_UNITS 8
 /* Used to pass the full-duplex flag, etc. */
@@ -2828,12 +2829,16 @@
 				   to develop a good interrupt mitigation setting.*/
 				outl(0x8b240000, ioaddr + CSR11);
 			} else {
+			    if (HiLoadCount++ > 0) {
 				/* Mask all interrupting sources, set timer to re-enable. */
 				outl(((~csr5) & 0x0001ebef) | AbnormalIntr | TimerInt,
 					 ioaddr + CSR7);
 				outl(0x0012, ioaddr + CSR11);
+			    }
 			}
 			break;
+		} else {
+			HiLoadCount=0;
 		}
 	} while (1);
 

--------------D96B7C3FC45C0A1E12BF396E--