[vortex] can't unload module

Andrew Morton andrewm@uow.edu.au
Sat, 23 Sep 2000 23:30:59 +1100


David Fries wrote:
> 
> 'net drop out' problem,
> There are two stages, reduced network and no network.  For example
> when I do a `ping -s 15000 aerospace` ping from spacedout (troubled
> computer) to aerospace (another one), I'll get response times of
> either 4ms or 3000ms.
> 
> When networking stops I don't get any packets received or interrupts,
> but I and showing RX overruns incrementing.  When I ping from
> spacedout, spacedout shows an arp request going out, aerospace sees
> the arp request, but spacedout never sees the reply.

This is consistent with an interrupt controller failure.  However if
this was the case you should be seeing "NETDEV WATCHDOG: eth0: transmit
timed out" messages and "interrupt posted but not delivered" messages. 
Are you sure you're not?

Another test: when spacedout is in this state, go to its console and
ping another machine.  Watch /proc/interrupts to see if you're getting
Tx interrupts.

If you are getting tx interrupts then perhaps the NIC is getting its
registers unprogrammed, or perhaps the multicast filter has gone silly. 
Try `ifconfig eth0 promisc'.

Or try a new PCI slot.

Or a new power supply.

Or a new computer.

BTW, I'm currently typing on a K6-2 machine (wildly overclocked - this
is my main workstation/router/firewall/server :)).  It's running
2.4.0-test8-pre1 with a 3c905B.  Solid as a rock.  Different motherboard
manufacturer: Gigabyte.

> I not sure, I think it should work, but it would matter on your mount
> options.

OK, I was asking because this problem is related to IP fragmentation,
and I assume (perhaps wrongly) that if rsize and wsize are larger than
your MTU, there will be a lot of fragmented packets.

> > Are you able to provide a set of steps with which others can reproduce
> > this?
> 
> 'net drop out'
> I'll just say no.  AeroSpace is running SMP, spacedout is not SMP.
> AeroSpace is a dual Pentium MMX, Spacedout is a K6-2.  They have
> basically identical network cards in them 3c905b, I have swaped the
> network cards in the past and the problems follow the computer not the
> card.
> 
> I would suggest try getting a FIC VA 503+ motherboard, K6-2 processor,
> 3c905B network card, go in X, have something rapidly updating the
> video card (rxvt doing `locate \*` worked fine), and send a ton of
> network data to the system at 100BaseT.

I just did that here:

	ping -q -f -s 64 -l 100000 bix

This caused `bix' to take a short trip to an alternate universe, but it
recovered fine when I killed the ping.

> If you REALLY pulled my leg you might get me to put one of my Pentium
> processors in the system, but I would rather not do that.

Sorry, I think you need to start swapping hardware in spacedout.  It's
sick.

> The new problem about 'unregister_netdevice: waiting ...' I can
> reproduce it by,
> insmod 3c59x
> ifconfig eth0 ...
> (on another console) ping -s 15000 -f aerospace
> ifconfig eth0 down; rmmod 3c59x
> 
> That usually gives about two lines of 'unregister_netdevice...' before
> is able to be removed.

That's normal.  There are orphaned IP fragments floating about in your
kernel.  They have a thirty second lifetime.  When they have all expired
the module unload is allowed to proceed.

> Odd thing about the 'unregister_netdevice' problem is I was still able
> to unload the module until I inserted my ne2000 card and ifconfiged it
> up.
> 
> I did,
> insmod 3c59x
> modprobe ne io=0x300 irq=111
> ifconfig eth0 ...
> ifconfig eth1 ...
> ifconfig eth0 down
> rmmod 3c59x
> and it keep giving, 'unregister_netdevice' message over and over until
> I rebooted.

I tried many combinations of this with a eepro100 and a 3c905C. 
Everything worked fine.  Sigh.

[ In reply to a later email ]

> What does rmmod and insmod do to the network card that vortex_down,
> vortex_up doesn't?  Something is different.

All the stuff in vortex_probe1() is run at insmod-time only.  It's
mainly driver data structure initialisation, but there's some hardware
initialisation as well.


David, If this problem is purely exhibited on `spacedout' then it's
quite possible that there are no software problems, although that
unregister_netdevice problem sure looks like software to me...   My
recommendation is to start swapping out hardware.  You get some
amazingly wierd stuff happening if the hardware is dodgy.