[vortex-bug] DPD FSH for Cyclone/Tornado

Bogdan Costescu Bogdan.Costescu@IWR.Uni-Heidelberg.De
Tue, 16 May 2000 20:48:26 +0200 (CEST)


On Tue, 16 May 2000, Donald Becker wrote:

> The new format is backward-compatibe.

Yes, that's what I wrote in my message...

> Your suggested change
> ________________
> 	vp->tx_ring[entry].addr = virt_to_le32desc(skb->data);
> 	vp->tx_ring[entry].length = cpu_to_le32(skb->len | LAST_FRAG);
> 
> 	if (vp->drv_flags & IS_BOOMERANG) {
> 	   	vp->tx_ring[entry].status = cpu_to_le32(skb->len | TxIntrUploaded);
>            ...
> 		outw(DownStall, ioaddr + EL3_CMD);
>            ...
> 	} else {
> 		vp->tx_ring[entry].status = cpu_to_le32(skb->len);

You got this wrong:---------------------------------------^^^^^^
I suggested:
		vp->tx_ring[entry].status = cpu_to_le32(TxIntrUploaded);

There is no meaning of setting the length in this field, the card does not
use it; in fact this field can be used to set a PktID (8 bit only) which
might be used as an index in tx_ring to look-up a packet without reading
from DnListPtr in a loop (I suggested this already in a previous
message) - like in vortex_interrupt.

> They look similar, but the calculation of
>       if (vp->drv_flags & IS_BOOMERANG)
> adds instructions to shuffle values on the register-poor x86.

I don't quite understand this: the test is done in both cases, you only
make the code on each branch a little longer. This does affect the cache
impact.

> Hmmm, now that I think about it, beyond the cache impact of loading the few
> extra instruction there likely is no performance impact.

Agreed. Another ideea would be to have separate routines for Boomerang and
Cyclone/Tornado where no "if" is required. The driver grows a bit, but the
cache impact is reduced.

> It *is* important to make certain vp->tx_ring[entry] is written as a whole
> cache line, rather than as individual words, but the code path change
> should still permit that to be done.

I assume that you mean the writting in data cache and not the cacheing of
the code that writes. But there is still something that I don't
understand: why are you so picky now about this when there are other
sub-optimal things in the driver? Is a cache fault so much worse that I/O
operations that we need anyway when dealing with H/W devices?
I'm thinking here of the tbusy flag which has no special use AFAICS. On
the netdev list I learned that hard_start_xmit (which is
boomerang_start_xmit in this case) is externally protected WRT
re-entrance. The test for calling vortex_tx_timeout can be done later
(when you check for full ring) and that's about everything that is needed.
And anyway, tbusy and tx_full are almost always in sync. (I said almost
because I haven't looked for all cases; in most of them their are in
sync). If this flag is eliminated, the code becomes shorter in several
places.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De