[Beowulf] substantial RX packet drops during Pallas over e1000 (Rocks 4.1)
David Kewley
kewley at gps.caltech.edu
Thu May 18 15:03:46 PDT 2006
On Tuesday 16 May 2006 23:44, Jeff Johnson wrote:
> Packet drop example: (other nodes post similar numbers)
> RX packets:1843133 errors:0 dropped:1245 overruns:0 frame:0
> TX packets:1764828 errors:0 dropped:0 overruns:0 carrier:0
Question is: Where do these drops occur?
I've not looked into it in detail right now, but I will suggest to you that
this "dropped" statistic may represent packets that the kernel successfully
received and delivered to the application's socket receive buffer, but the
application did not remove these packets from the buffer before the buffer
overran. At least, I've seen this behavior in other circumstances.
The "Recv-Q" and "Send-Q" columns in the output of netstat show you the
current size of data in the socket receive & send buffers. I don't know if
there's a better way to keep an eagle eye on a particular socket's buffer
used sizes.
Adding lines something like the following to /etc/sysctl.conf may help you,
so long as the application is fundamentally able to keep up with the
average flow rate, and just needs a little help to get by brief periods of
high flow or longer packet pickup latency:
# Increase network write buffer max size from the default 128k to 512k
net.core.rmem_default = 524287
net.core.rmem_max = 1048575
net.core.wmem_default = 524287
net.core.wmem_max = 1048575
# Increase TCP write buffer max size from the default 128k to 512k
net.ipv4.tcp_wmem = "4096 16384 524288"
Of course in your case we're concerned about the rmem values, not so much
about the wmem values.
David
More information about the Beowulf
mailing list