[Beowulf] substantial RX packet drops during Pallas over e1000 (Rocks 4.1)

Thu May 18 15:03:46 PDT 2006

On Tuesday 16 May 2006 23:44, Jeff Johnson wrote:
>    Packet drop example: (other nodes post similar numbers)
>            RX packets:1843133 errors:0 dropped:1245 overruns:0 frame:0
>            TX packets:1764828 errors:0 dropped:0 overruns:0 carrier:0

Question is: Where do these drops occur?

I've not looked into it in detail right now, but I will suggest to you that 
this "dropped" statistic may represent packets that the kernel successfully 
received and delivered to the application's socket receive buffer, but the 
application did not remove these packets from the buffer before the buffer 
overran.  At least, I've seen this behavior in other circumstances.

The "Recv-Q" and "Send-Q" columns in the output of netstat show you the 
current size of data in the socket receive & send buffers.  I don't know if 
there's a better way to keep an eagle eye on a particular socket's buffer 
used sizes.

Adding lines something like the following to /etc/sysctl.conf may help you, 
so long as the application is fundamentally able to keep up with the 
average flow rate, and just needs a little help to get by brief periods of 
high flow or longer packet pickup latency:

# Increase network write buffer max size from the default 128k to 512k
net.core.rmem_default = 524287
net.core.rmem_max = 1048575
net.core.wmem_default = 524287
net.core.wmem_max = 1048575

# Increase TCP write buffer max size from the default 128k to 512k
net.ipv4.tcp_wmem = "4096 16384 524288"

Of course in your case we're concerned about the rmem values, not so much 
about the wmem values.

David