[3c509] 3c509B TCP overflow trouble

Marco Emilio Poleggi poleggi@dis.uniroma1.it
Fri, 26 Oct 2001 15:45:41 +0200


Donald Becker wrote:

> On Mon, 22 Oct 2001, Marco Emilio Poleggi wrote:
> 
> 
>>I noticed an anomalous behaviour of my 3c509B TCP ("combo") card: it
>>gets stuck intermittently (can't receive nor transmit) for some
>>minutes. This happens on both the connectors (UTP and BNC), and the
>>log reports only this line:
>>
>>  kernel: _M_str_putnext: queue overflow: dropping a message
>>
>>I tried to enlarge the /proc/sys/net/core/netdev_max_backlog, but nothing 
>>changed (only the kernel messages disappered!).
>>I compiled statically the module on a 2.4.5 kernel. I noticed the same
>>trouble on a 2.2.16 kernel too.
>>
> 
> Presumably this error message is from the 2.4 kernel.  Your problem
> isn't with the 3c509 device driver, it's with the kernel.


Yup! Most probably you're right, infact I'm trying a 3c905b with similar 
results, as shows the dmesg output:

eth0: Transmit error, Tx status register 82.
   Flags; bus-master 1, dirty 23901(13) current 23905(1)
   Transmit list 0d512200 vs. cd512540.
   0: @cd512200  length 800005ea status 800005ea
   1: @cd512240  length 800005ea status 000105ea
   2: @cd512280  length 800005ea status 000105ea
   3: @cd5122c0  length 800005ea status 000105ea
   4: @cd512300  length 800005ea status 000105ea
   5: @cd512340  length 800005ea status 000105ea
   6: @cd512380  length 800005ea status 000105ea
   7: @cd5123c0  length 800005ea status 000105ea
   8: @cd512400  length 800005ea status 000105ea
   9: @cd512440  length 800005ea status 000105ea
   10: @cd512480  length 800005ea status 000105ea
   11: @cd5124c0  length 800005ea status 000105ea
   12: @cd512500  length 800005ea status 000105ea
   13: @cd512540  length 800005ea status 000105ea
   14: @cd512580  length 800005ea status 000105ea
   15: @cd5125c0  length 800005ea status 000105ea
_M_str_putnext: queue overflow: dropping a message
_M_str_putnext: queue overflow: dropping a message


I found that the last errors (_M_str_putnext...) come from an external module, 
but I don't know if they're related to network malfunctions. However, I've 
removed that module to see if things go better...


> 
> What is the error message with 2.2.16?

No errors! Only got the network stuck...



Anyway, I don't want to abuse of this mailing list, but I'd like to know something:


1)why the above log shows an eth0's tx error, whereas 'netstat -i' doesn't 
report it (see below)?

Kernel Interface table
Iface   MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0   1500   0  2271047   2275      0      0    25415      0      0      0 BRU
lo    16436   0    20185      0      0      0    20185      0      0      0 LRU


2) I noticed that during the network malfunctions the ARP resolution stops working

('arp' command can't return),

so the local network is unreachable (except the gateway!), while the outside world 

is reachable (e.g. via HTTP). So I conjecture that the problem is just the ARP management.

Does anybody knows about ARP problems with 2.4.5 kernels?



Bye!


m.e.p.





-- 
________________________________________________

Ing. Marco Emilio Poleggi
Universita' degli Studi di Roma "La Sapienza"
Dipartimento di Informatica e Sistemistica
Via Salaria 113, 00198 Roma - Italy
Tel: +39 06 49918479    Fax: +39 06 85300849
E-mail: poleggi@dis.uniroma1.it