more NetGear mising ARPs

Keith Owens kaos@ocs.com.au
Sat Jun 12 21:35:35 1999


On Sat, 12 Jun 1999 23:55:19 +1000 (EST), 
Neale Banks <neale@lowendale.com.au> wrote:
>This evening I have now had it perform (i.e.
>incomplete ARP) 5/5 times by first setting up a 1400 byte ping to the
>Cisco from the other Linux host on the subnet.
>
>One final cross-check: I've stopped the 1400-byte pings and am rebooting -
>up it came and can ARP the Cisco happily.

Let me see if I have this right.  Three machines on a LAN without a
switch in the way.

A - Another box.
B - Bad tulip box.
C - Cisco.

ping -s 1400 from A to C stops B completing an ARP from B to C.
Correct?

That is seriously wierd.  First thing to test is what A and B can see.
Try this sequence.  Use IP addresses, not host names so we don't have
DNS traffic getting in the way.

B /etc/rc.d/init.d/network stop (or your distribution's equivalent)
B rmmod tulip (makes absolutely sure there is no residual state)
A arp -d B
A arp -an (should have no entry for B)
C clear arp-cache (I think that is the Cisco command)
C show-arp (should have no entry for B)
A ping -s 1400 C
A tcpdump -nleieth0 arp or icmp | tee /var/tmp/Alog (promisc)
B modprobe tulip
B /etc/rc.d/init.d/network start
B tcpdump -nleieth0 -p arp or icmp | tee /var/tmp/Blog (not promisc)
B ifconfig >> /var/tmp/Blog
B ping -c 3 C
B kill tcpdump
B (ifconfig ; arp -an) >> /var/tmp/Blog
A kill tcpdump, ping
C show-arp (what does it show for B?)

Repeat until you reproduce the problem.  Alog and Blog will tell us if
the problem is on the Tx or Rx side of the failing tulip.  From your
description it sounds like Rx is getting confused but I cannot see how
at the moment so I want to confirm it.