[vortex] Re: 3com 3c905c-txm

Mon, 15 May 2000 15:50:33 +0200 (CEST)

On Sun, 14 May 2000, Andrew Morton wrote:

> > As I mentioned in the previous reply to the linux-vortex mailing list, the
> > 3c59x.c code in 2.3.99 is completely bogus.
> You said it was wrong, but you left me guessing as to precisely why.
> My guess is that you were referring to the Cyclone no-DnStall, no inw/outw code path?

Which is bogus in itself. The docs for 90xC, page 90:
	"... the host must stall the NIC before modifying the downlist or
	writting a new value to the DnListPtr register (unless the value
	is already zero). This is accomplished by issuing a DnStall
	command..."
(I hope that I'm not infringing copyright here...)
So, DnStall/DnUnstall have to be done.

> Now, I really need a hint here.  I am aware of and have looked at your optimisations in post-0.99M versions.  The problem is that I cannot correlate these with the 3Com
> docs.  3c90xc.pdf and 3c90xb.pdf both say "Use DnStall unless you're polling DPDs".  We are not polling DPDs.  I suspect that there is some piece of information which both
> you and Bodgan have which I have missed.

No, I don't have any additional info and I agree with Andrew. In fact,
modifying the Tx part for use of polling (DnPoll register, page 100) was
included on my list of specific improvements for C cards (it also exists
on B cards, but I only have 4 of them which cannot be used for testing
right now). If the polling mode is enabled, Donald's Tx path is correct
WRT lack of I/O operations. But there is at least one other problem that
has to be solved: after you add the new DPD to the list, you have no
indication when the download takes place - you only know that the time is
less than the polling interval. However, cur_tx and tx_full are modified
_after_ adding the new DPD to the list which opens a window where a
similar race to what I posted about 3 weeks ago is possible. (And to
answer to Donald's query: this is well known code - 0.99L with TX_ &
RX_RING_SIZE = 2 and I can reproduce the race in about 2 minutes under
load).
That's why I suggested (with the current DnStall/DnUnstall procedure) to
move the DnUnstall to after updating these variables; this way no download
can take place while the variables are updated, no interrupt with
DownComplete status is produced and the race is avoided.
If you move the update of these variables to _before_ adding the DPD to
the list (with polling enabled), there may be other races - I haven't yet
looked into it and that's the reason for which I haven't proposed this to
Andrew.

> Ditto all the above for your memory-space vs I/O-space optimisation.

I tried changing to mem-space Andrew's driver. I don't have a complete
working code, because I was only interested in my case to see if there are
real performance improvements - so I only initialize it and use it, no
clean-up. AFAICS from my tests, there are no improvements, but my tests
are user-level.
That's why I also have thought of another way to avoid card access at all,
which is in fact covered in the docs - using the dnComplete bit in DPD's
status header (page 86 and later) which is normal memory access to replace
access to DnListPtr card register. However, B and C cards have different
download sequences (page 92 in C docs, page 101 in B docs) and also no
direct contact to the card means that we have to be very careful about
pointers - for these reasons I'm still working on it and haven't fed it to
Andrew.

> > The DownStall is only useful on the Boomerang, not the later chips.
> > Using DownStall with later chips will result in exactly what you are seeing
> > -- hundred or thousands of wasted PCI transactions for each packet.
> I have not done the histogram, but the vast majority of times it is a couple of cycles.  The problem is that the duration appears to be unbounded on collisiony LANs.  I
> would really appreciate your help in understanding this issue.

In my case, I was using a value of 600 (present in Andrew's driver) in
boomerang_start_xmit when the problems appeared. This is close to "some
hundreds" that Andrew reported; upping it to 4000 and moving it into a
separate function (so, adding a jmp/ret to the code) prevented it. I 
didn't say that I fully understand what is going on, I suggested an
explanation to which Andrew agreed and presented in his last messages.

> > The later chips do not need any spinlocking with a properly designed
> > transmit routine.  Especially not the painfully slow 'irqsave' versions of
> > the spinlocks.
> How so?  spin_lock_irqsave() is six instructions if the lock is free.

What does "properly designed transmit routine" mean? In Donald's drivers,
cur_tx is updated in start_xmit and dirty_tx is updated in the ISR;
everything fine up to now. But some lines later, in both cases, a
difference is computed out of the two and then tx_full is set in 
start_xmit which is also used in the ISR as a condition. This is subject
to race and I pointed it. I don't think that with the existing
structure the locking can be avoided; I thought of removing tx_full
and always compare the difference (cur_tx - dirty_tx) with TX_RING_SIZE,
but I didn't have time to see what are the implications.

> > I shouldn't be pointing out specific changes that were bad, because people
> > will assume that the rest of the changes were OK.  Generally, if one set of
> > changes are badly designed, it means that all of the changes should be
> > reconsidered.
> > As I said above, it's mainly _lack_ of changes at issue here (I think).

I would _really_ like to have oppinions about _specific_ changes and if
(idealy) all the changes receive comments there will be no "rest to be
OK". Because 0.99H/L/M/N drivers have the above-mentioned race, it doesn't
mean that everything in these drivers is wrong!

> Yes, I have made changes.  We believe we have identified several SMP and IRQ and CPU/hardware races as well as a couple of media selection issues.  I have been careful to
> discuss these on lists which you are known to frequent.  The changes are carefully documented.  The few comments you have made have, of course, been invaluable.   More such
> comments would be equally valuable.
> After all, we are all on the same side here.

Absolutely!

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De