[realtek] Bug in rtl8129_rx() & other problems

Donald Becker becker@scyld.com
Tue Apr 16 15:31:01 2002


On Tue, 16 Apr 2002, Stephan Brauss wrote:

> I think I have found a bug in rtl8129_rx(): It is possible that dev_alloc_skb()
> is called with a negative argument, which causes my machine to crash.

What driver version are you using?
What is the detection message?

> My system runs a heavy rtlinux task, that uses about 90% CPU time.
> Therefore, network interrupts are no more handled so quickly.

I suspect that the problem you are seeing is related to this.

> Anyway, after many hours of debugging, I have changed the code like follows:

> +                       if(pkt_size<0)
> +                       {
> +                               printk(KERN_ERR"%s: Impossible packet length.\n",dev->name);

Do you see this message?  What is the Rx status when this occurs?

> With the patch, the kernel no more crashes but I still get other messages:
> eth0: Abnormal interrupt, status 00000011.

Rx Overflow

> eth0: Abnormal interrupt, status 00000021.

RxUnderrun

> eth0: RTL8139 Interrupt line blocked, status 4.
> eth0: RTL8139 Interrupt line blocked, status 5.

The R-T patches are obviously doing Bad Things.
Your kernel is not servicing interrupts in a timely manne.
This will cause packets to be dropped.  It shouldn't crash the driver,
but you should expectmany  packets to be dropped.

> eth0: Transmit timeout, status 0d 0000 media 00.

More badness.

> The "Abnormal interrupt" messages disappear when I increase
> RX_BUF_LEN_IDX to 3 (64K).  I think they come from receive buffer
> overruns, because the rttask uses much CPU time.

Yes.
R-T shouldn't be used to completely consume the CPU, it is intended to
be used to provide priority response to certain events.  To do this it
relies on having adequate average CPU to handle all pending task.

> The "Interrupt line blocked" is strange... Could you please explain me
> the meaning of the "Check for bogusness" comment/code part?

It's intended to detect the case where the interrupt mapping is bogus,
or becomes bogus due to an old APIC bug.  SMP implies APIC, so that's
the SMP tie-in.


-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993