[Fwd: icmp: ip reassembly time exceeded]

Robert G. Brown rgb at phy.duke.edu
Sat Jan 12 11:57:07 PST 2002

On Sat, 12 Jan 2002, Jacques B. Siboni wrote:

> Dear all,
> i forward the following mail to ltsp and beowulf as I use these concepts and
> Mosix group seems to be in a very depressed mood.
> The problem I encounter occurs before Mosix even starts. There is some new
> kind of stuff with kernel 2.4.xx that does not accept some kinds of fragments.
> It is more an NFS boot problem.
> One (quick and dirty) solution could be to allow the kernel to load even with
> an mtu less than 1500, which I could not do.
> Thanks in advance
> Jacques

Dear Jacques,

This has the look and feel of a hardware problem with your physical
network.  The fact that small packets sometimes make it through but big
ones don't is telling indeed.  You don't describe your physical network,
but one of the following could easily be the problem:

   a) Bad wiring.  One wire with an almost-broken wire can do this.  So
can poorly wired connectors at the punchblock or inside the RJ45

   b) Wiring runs that are too long.  100BT has a maximum radius of 100
m from a switch that can retime packets.  If runs are too long, a
collision condition can easily occur as one host brings up the line to
send but the signal doesn't have time to propagate to a host downstream
in time to keep it from ALSO bringing up the line to send.  In a high
traffic density network, lots of packets collide and are lost, and
perhaps smaller packets have a better chance of making it through at
least sometimes.

   c) Hubs instead of switches, especially too many hubs.  Packets sent
to a hub are echoed on all lines, and ANY system trying to send in the
same window will cause a collision.  Too many hubs add latency that
reduces the effective diameter of your network and increases the
probability of collisions.  Switches actually read a packet and
retransmit it on ONLY the line it is destined for, and retime the packet
besides.  This isolates systems from traffic not intended for them and
improves network stability and performance.  Offhand I can't remember
the maximum number of "repeaters" (hubs) permitted in a 100BT network --
something like 3 -- because I haven't used hubs for years now, ever
since switches got so cheap.  Good switches will also sometimes indicate
lines with a fault condition and isolate those lines.

   d) Cheap/bad NICs.  It is just my opinion, but this includes all
RTL8139 NICs from any manufacturer.  These NICs have exhibited behavior
like that which you describe in my own systems all by themselves on an
otherwise perfect network -- if you flood them with a packet stream,
they can easily end up dropping all but one or two packets in a hundred.
Again, small packets probably doesn't improve their efficiency (it just
makes for a longer stream with even smaller interpacket gaps) but it
likely does improve the probability that a packet will make it through
before timing out.  Unfortunately, RTL8139's are nearly ubiquitous,
since they are available in $10 NICs and some folks cannot resist the
bargain.  If you have 8139's, just throw them away and buy a decent NIC
-- eepro100, tulip, 3c905 -- and your problem may magically go away.

  e) It's a long shot, but a poorly supported card/driver or
interference with a particular chipset or motherboard or card
combination "can" cause things like this, but frankly I doubt it.  I'd
work a-d over pretty thoroughly before I started worrying about problems
in the base linux kernel or network drivers (RTL drivers excluded,
although it isn't really a driver problem per se) or exotic chipset
problems.  This is presuming that you are running a reasonably recent
and/or non-SMP production kernel.  If you are running a really old
kernel (especially a really old SMP kernel) or an exotic homemade kernel
with strange drivers or the like, after I finished asking "why" I'd
agree that doing something sort of dumb like this could also cause such
a problem.

There are some lovely online guides to the care and feeding of Ethernet
networks, many of them linked to www.phy.duke.edu/brahma or available on
the Scyld website.  One or more of them will (for example) tell you the
maximum number of repeaters permitted if in fact you are using hubs and
have a very large physical network.

Hope this helps.  I'd advise investing in (minimally) a network cable
tester and if there is any chance at all your cable runs are too long,
in a reflectometer.  If you are using hubs or RTL NICs, I'd STRONGLY
recommend swappping them out for switches and decent NICs as rapidly as
possible, especially for the lines connecting to your servers.


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list