Another 2.2 version of 3c59x.c

Andrew Morton andrewm@uow.edu.au
Wed Apr 26 22:57:04 2000


Bogdan Costescu wrote:
> 
> On Thu, 27 Apr 2000, Andrew Morton wrote:
> 
> > "Crash" how?  Do we know that it was driver-related?
> 
> That's the problem, I'm not sure. However with the same kernel and 0.99L
> it does not appear, at least on the same time-scale. The crashes that I
> experienced were with a flooding ping over the weekend (so I don't
> really know when it happened) and with my parallel jobs where the time to
> crash varies between several minutes and several hours.

Nasty.  Did you try patching up the kernel?  2.2.15, perhaps?

> > Spent most of the day working on the "tx timeout" problem.  It's being
> > caused by 16 successive collisions.  You won't see it on switched
> > ethernet, of course.
> >
> > I fixed it by resetting the transmitter and fiddling with the DMA engine
> > when it happens.  3Com assert that 'TxEnable' is enough to recover from
> > maxCollisions, but it isn't so.  TxEnable works most of the time on a
> > 905B and _never_ on a 575.
> 
> I see an error message related to this at the beginning of
> vortex_tx_timeout, but I don't remember you saying that the error message
> appears. However, vortex_tx_timeout is called only when tbusy is set which
> is done based on the Tx ring and not on the hardware state, so this is
> probably the explanation: you send only few packets which do not fill the
> ring, but while sending the collisions occur. With the current
> implementation, the packet which caused the collision is discarded (as
> stated in the docs) and probably the upper levels (TCP) recover at a later
> time if they care.
> vortex_tx_timeout seems to recover from this situation by doing TxReset
> and TxEnable which looks like what you described, but the packet is lost,
> there is no attempt to resubmit it. However, resubmitting the packet seems
> not like an easy job, because by the time you do this another packet might
> be "on the wire" which means that you need to stop the transmitter and
> anyway is there any insurance that you catch this event before the
> transmission of the next packet is finished?

The idea is to prevent vortex_tx_timeout() from being called at all.

I did this by recovering from maxCollisions within vortex_error() (as
soon as the hardware detects the problem), rather than 0.4 seconds
later, in vortex_tx_timeout().

The issue was that maxCollisions was stopping the transmitter,
vortex_error() was not recovering and we were later hitting tx_timeout
for the recovery.

> What do you mean by "TxEnable works _most_ of the time on a 905B" ?

The 90x spec says that you can recover from maxCollisions simply by
reenabling the transmitter with TxEnable.  On my 905B this usually
works, but occasionally TxEnable fails to restart the transmitter and we
twiddle thumbs until timeout expiry.  On the 575 this happens all the
time.

It could be that I've missed the mark and there is some other subtle
timing problem causing this.  I can't see it though.

I should try 3Com's driver - it appears they will have the same
problem.  I'll do that.

-- 
-akpm-
-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-vortex-request@beowulf.org