where is k_compat.h?

James Ralston qralston+ml.linux-vortex-bug@andrew.cmu.edu
Thu Feb 3 17:23:33 2000


Paul,

> I create the problem by displaying a series of simple gnuplot
> x-windows one per second (which isn't really heavy traffic at all)
> on any of several machines connected to either the same 100 Mbps,
> another 100 Mbps, or a 10 Mbps switch.

Actually, you'd be surprised just how much overhead the X11 protocol
has.  ;)

Seriously, though, since the X11 protocol uses TCP/IP, I'd expect it
to be affected by whatever these problems are.

> An interesting thing is one of the machines on another 100 Mbps
> switch is a 450 Mhz PIII with another 3c905b-tx-nm NIC but purchased
> several months ago.  That machine works great.

Now that *is* interesting.  This machine is connected in full duplex
mode?  And it can talk to other machines on the same switch, which are
also connected to the switch in full duplex mode?

> The problem also occurs during "mount -t nfs -a" of 9 disks on
> various machines.

I'm assuming this is NFS version 3 over TCP/IP, yes?

> If I ping from the stalling machine to another, or vice versa, the
> packet loss rate is around 20% and those that survive report times
> about 10000 times normal.

I wouldn't expect ICMP packets to be affected, as they are layered
directly on top of ICMP.  Hmmm.

> > Try turning off the RFC1323/RFC2018 features of the Linux 2.2.*
> > kernel, by doing the following:
> > 
> >     # cd /proc/sys/net/ipv4
> >     # echo 0 >tcp_sack
> >     # echo 0 >tcp_timestamps
> >     # echo 0 >tcp_window_scaling
> > 
> > I suspect doing so will prevent the connection from stalling
> > entirely, but the throughput will probably still slow to a crawl.
> 
> Your suspicion was correct.  This does prevent the complete
> stalling, but the slowing does occur.  The ping statistics that I
> described above hold in this case too.  Roughly 20% packet loss with
> times 10000 times normal.

I thought so.  The optimizations in RFC1323 and RFC2018 (which deal
with tuning TCP/IP for high-throughput, high-latency links) appear to
exacerbate whatever this problem is.

> > Try forcing both the switch and the NIC to half duplex mode.  In
> > my limited testing, I've found that if I force the NIC of the
> > machine that is *sending* the data to half duplex, the performance
> > problems mostly vanish.
> 
> Setting both the nic and the switch to half duplex does appear to
> solve the problem.  Thanks for the help.  If there is anything else
> I can check out let me know.

Aha.

I know that when full duplex equipment first started appearing, there
were initially many compatibility problems.  Autonegotiations rarely
worked, and even there were some switch/NIC combinations that just
didn't work well in full duplex mode.

I would've thought that these issues would've been resolved by now.
But perhaps it's the case that some versions of the 3Com Vortex NICs
just plain don't work well in full duplex, or don't work well when
they're talking in full duplex to certain switches.  I find it very
suspicious that *every* instance of these performance problems I've
seen so far is resolved by setting the affected equipment to half
duplex.  That might be just a little too much of a night-and-day
difference to be explained by a problem with the Linux kernel 2.2
TCP/IP stack...

James

-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-vortex-bug-request@beowulf.org