[realtek-bug] v1.13: 8319 Card hangs under heavy load (easy to reproduce)

Jochen Tuchbreiter (domainfactory) jt@domainfactory.de
Sat, 7 Jul 2001 10:32:44 +0200


(The same bug-report was also submitted to the 8139too project since this
prob also occurs with kernel 2.4.X)

It looks like there is a problem with realtek 8139 under heavy load: While
performing concurrent max. rate downloads (on a system with slow disk i/o
reason
unknown) the nic simply hangs after a few secs. The systems continues to
run, ifdown / ifup brings the nic back on.

We can reproduce this problem on all of our systems that use 8139 cards
(kernel 2.2.19 (with stock rtl8139 and also with latest rtl8139) and 2.4.5).
Here is how we do it:

We use

Machine a) some IDE machine with realtek card
Machine b) Some other machine running a webserver and a providing large file
for download

a) and b) are connected to each other via 100Mbit (switched in our case)

- turn dma mode off for all drives in machine a (hdparm -d0 /dev/hda ...).
this is a necessary step to provoke the problem.

- run "wget http://machine.b/bigfile &" about 10 times concurrently

- network of machine a) will hang after a few secs
- after replacing nic in machine a) with a different non-realtek nic (we
tried 3com and tulip) the same test will not hang the network.

IMHO this could mean that the problem occurs when some of the nics buffers
overrun but you are the experts here ;-)

It is possible that this is a generic problem with the 8139 and not
linux-related.

Most of the time an error-message like "eth0: Oversized Ethernet frame,
status 4a398a48!" (numer is varying) can be seen on the console when the
card hangs.

I will gladly help if you have trouble reproducing this problem, we were
able to reproduce it with PCI-Adapters as well as with on-mainboard Realtek
8139 chips.

Please contact me if any questions arise.

If you want me to state the problem more precisely / run diagnostics tools
please tell me / point me to the instructions for this.

best Regards,

Jochen Tuchbreiter

p.s.
8139too authors say they have "fixed" this problem now (I guess its this
on-line change:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gkernel/linux_2_4/drivers/net
/8139too.c.diff?r1=1.2&r2=1.3 ) but since we don't run 2.4.X on any
production machines I can't confirm this.

--