more NetGear mising ARPs

Neale Banks neale@lowendale.com.au
Sat Jun 12 09:49:53 1999


Hi Keith,

On Sat, 12 Jun 1999, Keith Owens wrote:

> On Fri, 11 Jun 1999 21:28:46 +1000 (EST), 
> Neale Banks <neale@lowendale.com.au> wrote:
> >As others have suggested, it's as though the arp reply is either not seen
> >by the card or not delivered up by the driver.
> 
> Does switching the card in and out of promiscuous mode have any effect?
> "ifconfig eth0 promisc", try the Cisco, "ifconfig eth0 -promisc", try
> Cisco again.

In failing to recreate this problem earlier today, I think I have
corroborated an earlier observation of <bart@vianet.net.au>:

	"It seems to only fail if there is net activity on the box Im
	trying to ping even if its small say even 5-10k a sec,"

This morning (being Saturday) I just could not get this problem to perform
- then I remembered the above and that there isn't much traffic in the
office on a Saturday.  This evening I have now had it perform (i.e.
incomplete ARP) 5/5 times by first setting up a 1400 byte ping to the
Cisco from the other Linux host on the subnet.  Anyone hazard a guess as
to the significance of this?  It also possibly means that on a switch you
may be far less likely to see this problem (and with a 100Mb card there's
a fair chance of being on a switch?).

Regarding switching in and out of promiscuous mode:

I've now booted a 6th time and got the incomplete ARP again.  Putting eth0
into promiscuous mode cleared the problem straignt away, "arp -d"ed the
entry and it re-appears on (successful) ping of the Cisco; put the card
back out of promiscous mode and the ARPing appears to remain happy.  You
may be on to something here :-)))

7th reboot: initially "incomplete" ARP; setting promisc didn't immediately
help, I deleted the "incomplete" entry and could happily ping the Cisco;
turned off promisc, deleted the ARP entry and can ping the cisco (but
there was a brief appearance of the incomplete ARP entry - could that be
just that this ARP took a while to complete?).

For the record, the 1400-byte ping was still running through all of this
(seq# > 4000 now).

> If that works, it could be the problem I saw with Xircom RBEM56G.  The
> TX/RX rings get confused when CSR6 is changed too quickly and the MAC
> filters are not setup correctly.  If promisc on/off works, try tulip.c
> from ftp://ftp.ocs.com.au/pub/xircom-RBEM56G-howto-2.tar.gz.  This
> patched tulip 0.91 is mainly intended for RBEM56G but the CSR6 fix
> should work on other tulip cards, set strict_csr6=1.

OK, grabbed that and made and installed a new kernel-image package (tulip
as a module). Re-started the 1400-byte ping; test reboot - phew, it came
up OK and (unsurprisingly) still has the incomplet ARP problem.  Changed
/etc/modules so that the tulip line is "tulip strict_csr6=1".  Rebooted
and I still have the incomplete ARP :-(

OK, just to be sure, I've changed
	static int strict_csr6 = 0;
to:
	static int strict_csr6 = 1;
in tulip.c, and rebuilt.  Rebooted and still incomplete ARP. {:-(

Checking dmesg: yes we have "tulip.c:v0.91 4/14/99
becker@cesdis.gsfc.nasa.gov (modified by danilo@cs.uni-magdeburg.de for
XIRCOM CBE, kaos@ocs.com.au for RBEM56G (-2))".  Also, setting promiscuous
mode still fixes it and resetting out of promiscuous mode still allows
successful ARP.

One final cross-check: I've stopped the 1400-byte pings and am rebooting -
up it came and can ARP the Cisco happily.

What else can we try?

Thanks,
Neale.