[vortex] Re: transmit timeouts wi Cyclone (fwd)

Grant Basham grant@rsmas.miami.edu
Thu, 10 Aug 2000 15:20:19 -0400 (EDT)


The nodes are linked thru Foundry Fastiron series switches running Foundry's
version 07.0.05T41 software (May2000).  I see the problem when testing pairs
of nodes on the same and on different hubs.

On my 99H production node that was getting too much work/interupt and dying,
I put up version 99Q and set max_interrupt_work from 20 to 100. It logged a
handful of transmit timeouts (seemed same as original posting) but recovered
without apparent incident.  There should not be a heavy load on this node
from the Alphas, it is a news server and does most of its traffic on the
WAN.  Did not see any more "too much work" messages.

--Grant

  Grant Basham       (305)361-4026       University of Miami
  grant@rsmas.miami.edu      RSMAS Computer Facility/Systems

---------- Forwarded message ----------
Date: Thu, 10 Aug 2000 14:32:13 +0200 (CEST)
From: Bogdan Costescu <Bogdan.Costescu@IWR.Uni-Heidelberg.De>
To: grant basham <grant@rsmas.miami.edu>
Cc: vortex@scyld.com
Subject: Re: [vortex] transmit timeouts wi Cyclone


On Wed, 9 Aug 2000, grant basham wrote:

> NIC: 3c905B Cyclone 100baseTx
> Linux: i386 (2.2.12 and 2.2.15) with 3c59x.c:v0.99Qk 7/5/2000
> 
> Target: Compaq/Dec Alpha Tru64 Unix Ver(4.0 and 5.0) with
> 	  tu0: DECchip 21143: Revision: 3.0 (a Tulip chip),
> 	  configured for 100Mbps/full or half duplex.

How are these computers connected ?

> Under load with the 99Q driver (running netperf -H Target) I get transmit
> timeouts that require a system restart to fully clear.  I can kill the
> netperf and down the interface to stop the tnsmt tmout errors.  Same problem
> with the test 99Ra driver.

These drivers use an advanced facility for Cyclone and Tornado chips - Tx
polling mode. I haven't fully understood the new code for this, so maybe
Donald is able to say more about the behaviour in the Tx timeout
situation.

> I get good performance in the same test against Linux targets running both
> 3Com905 and Tulip chips, and against Tru64Unix systems set to 10Mbps/half
> duplex

This is a bit strange. It's like the Tru64 machine overloads the
connecting device (hub/switch) when in 100Mbps mode.

> The 99H driver does not give these errors, but with that driver, a
> production node has died a couple times this week from "too much work at
> interrupt" with the subsequent failure to restart.

This message appears when the node is too slow to process incoming 
packets. But AFAIK, after issuing the message, it should just go on.

You can also try the driver at:
	http://www.uow.edu.au/~andrewm/linux/
which has some modifications in this area.

Sincerely,
Bogdan Costescu
...
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De