via-rhine SMP problems

Peter Monta pmonta@halibut.imedia.com
Mon Aug 30 05:18:47 1999


There seem to be some SMP-related problems with the via-rhine NIC
hardware or network driver; at least, all problems go away when
running under a UP kernel with identical configuration.  There was
some linux-kernel discussion on 11-12 August ("via-rhine nic
crazyness"), but no hints there that I could see.

I'm using 2.2.13pre1, except that its via-rhine.c has been
replaced with v1.03a.  All kernels are nonmodular (these
are diskless Beowulf nodes); the motherboard is an Abit
BP-6 with two Celeron 400s.

Running ttcp using the SMP kernel causes a number of "something
wicked happened" messages, and using the default ring sizes
of TX=8 and RX=16, very shortly no more packets will go in or out
of the box.  After increasing to TX=16 and RX=128, the network never
actually goes down, and indeed a single ttcp works perfectly however
long; but with back-to-back ttcp's, there are again the "wicked"
messages, with roughly an even mixture of 0009, 000a, and 000b.

The driver source says 0008 is TxAbort.  (I think the "wicked"
printk() is active only when some other bit, like TxDone or RxDone, is
set as well.) I don't see what the transmit subsystem could be getting
unhappy about---the other end is a 5-port Linksys switch, and
everything is correctly at 100baseT full-duplex.  

Sigh, the VT86C100A data sheet says this TxAbort bit is set when
there are excessive collisions; this is absurd, because the driver
is setting the chip full-duplex ("eth0: Setting full-duplex based on
MII #8 link partner capability of 41e1.").  The card is
VT3043-based though (D-Link DFE-530TX, chip marked "DL10030"),
so who knows what the bit really means.

What is the simplest way to isolate the TX and RX processing
with some sort of lock?  Would making start_tx() and
netdev_rx() mutually exclusive be a good idea, so as
to approach the UP case?  Or should I just dump the cards?

Beginning of log messages after faulty run:

Aug 30 01:03:39 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:39 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:39 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:39 n00 kernel: eth0: Something Wicked happened! 000b. 
Aug 30 01:03:39 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:40 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:40 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:40 n00 kernel: eth0: Something Wicked happened! 0809. 
Aug 30 01:03:40 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:40 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:40 n00 last message repeated 2 times
Aug 30 01:03:41 n00 kernel: eth0: Something Wicked happened! 000b. 
Aug 30 01:03:41 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:41 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:41 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:42 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:42 n00 kernel: eth0: Something Wicked happened! 000b. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 0809. 
Aug 30 01:03:43 n00 kernel: eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x176a5 length 0 status 0600! 
Aug 30 01:03:43 n00 kernel: eth0: Oversized Ethernet frame c7fde250 vs c7fde250. 
Aug 30 01:03:43 n00 kernel: eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x176a6 length 82 status 528d00! 
Aug 30 01:03:43 n00 kernel: eth0: Oversized Ethernet frame c7fde260 vs c7fde260. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 000b. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 0009. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 000a. 
Aug 30 01:03:43 n00 kernel: eth0: Something Wicked happened! 080b. 

Cheers,
Peter Monta   pmonta@imedia.com
Imedia Corp.