[realtek] 8139too performance - can you shed some light?

Donald Becker becker@scyld.com
Thu, 2 Aug 2001 13:31:29 -0400 (EDT)


On Thu, 2 Aug 2001, Ben Greear wrote:
> "Carlo E. Prelz" wrote:
> > 
> >         Subject: Re: [realtek] 8139too performance - can you shed some light?
> >         Date: Wed, Aug 01, 2001 at 03:00:17PM -0700
> > 
> > Quoting Ben Greear (greearb@candelatech.com):
> > 
> > > See if you can tell where the pkts are being dropped, and for what
> > > reason.  Take a look at the /proc/net/dev file on all the machines,
> > > for instance.

The number of packets dropped by the hardware, presumably because there
was no room in the Rx ring, shows up in
   stats.rx_missed_errors 
The number of packets dropped because there were no skbuffs available is
counted in
   stats.rx_dropped

> > I spent the whole day exploring the problem, and I reached the
> > conclusion that the bottleneck is in the processor..
> > .. Evidently the simple processing power
> > that is required to handle sustained 100MBps traffic is too much for a
> > Pentium-mmx class cpu running at 233MHz.

>> For what it's worth, I see about 20-40% CPU usage improvement when going
> from the realtek to the EEPRO NICs.  I'm not sure why, but it is definately
> noticable.

This is easy to explain.
The rtl8139 can only transmit from aligned packets.
Most packets are unaligned after prepending the 14 byte Ethernet header.
The rtl8139 driver must copy each packet to a bounce buffer before
transmitting.

On the receive side, the rtl8139 receives into a continuous ring.  We
can't work on the packets in the ring, so they must be copied to a
receive skbuffer immediately.  We copy+sum in one step, but it's still an
extra copy.

The eepro100 can send from and receive into skbuffers without a copy.

If you know your architecture, you can likely tweak the copy code,
especially the Tx copy code, to get higher performance.  The driver uses
memcpy().  The optimized mmx-copy is likely much faster.

For those that wonder why we must immediately copy from the Rx ring
rather than operating on the data without copying:
consider an IP fragment that we must hold until the other fragments
arrive.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993