[Beowulf] tcp error: Need ideas!
Gerry Creager
gerry.creager at tamu.edu
Fri Jan 23 05:49:23 PST 2009
First, thanks to all who've responded. I've been looking a bit thins
morning and am trying to grok the results.
Joe Landman wrote:
> Hi Gerry
>
> Gerry Creager wrote:
>> History/background/description of the cluster
>> * 126 node Dell 1950 cluster with dual-quad core Xeons
>> * HP 5412zl switch for gigabit cluster backplane and 10GBE
>> interconnect to selected services (file server, etc)
>> * Gigabit interconnect
>> * Hand compiled 2.6.26 kernel
>> * bnx2 module loaded for the Broadcom onboard nics
>> * Switch, compute nodes, head node set to 9000 byte MTU
>
> We have had *lots* of problems with Broadcom nics and jumbo frames. From
> 2.6.9 timeframe onwards.
Marvelous. I'd prefer to not have to back-rev if I can avoid it...
>>
>> We're seeing the following error in WRF compiled with openMPI and the
>> PGI 7.2 compiler:
>> mca_btl_tcp_frag_send:writev failed with errno=104
>>
>> While all nodes were accessible prior to the run and returned
>> appropriate "stuff" when queried with, eg., ssh and a command, two
>> nodes now return something like this:
>> [gerry at brazos SCOOP12km]$ ssh c0522
>> Received disconnect from 192.168.200.154: 2: Bad packet length 808464432.
>
> Hmmm... sounds like a link tried re-negotiating. Can you get on via
> serial/console and
My guess is that the driver wandered across memory boundaries. This
stinks of a buffer problem to me. Typically, after this happens, I
can't log into the node via any interface, nor on console. It requites
an ipmi or physical reboot.
> root at lightning:~# ethtool eth0
-bash-3.2# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Link detected: yes
> You might want to
>
> ethtool eth0 autoneg off
>
> to force it not to renegotiate its speed. Also, look at
-bash-3.2# ethtool -A eth1 autoneg off
autoneg unmodified, ignoring
no pause parameters changed, aborting
> root at lightning:~# ethtool -g eth0
-bash-3.2# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 1020
RX Mini: 0
RX Jumbo: 4080
TX: 255
Current hardware settings:
RX: 255
RX Mini: 0
RX Jumbo: 765
TX: 255
> See if you can do something like
>
> ethtool -G eth0 rx-jumbo 100
>
> if you have zero jumbo ring rx entries.
Doesn't look like this requires much change.
Also, while I'm in the neighborhood, to respond to Mark's suggestions:
-bash-3.2# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
Hmmm Might be worth changing tcp segmentation here.
-bash-3.2# ethtool -S eth1
NIC statistics:
rx_bytes: 43454
rx_error_bytes: 0
tx_bytes: 51103
tx_error_bytes: 0
rx_ucast_packets: 231
rx_mcast_packets: 0
rx_bcast_packets: 329
tx_ucast_packets: 250
tx_mcast_packets: 0
tx_bcast_packets: 4
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 365
rx_65_to_127_byte_packets: 166
rx_128_to_255_byte_packets: 20
rx_256_to_511_byte_packets: 7
rx_512_to_1023_byte_packets: 1
rx_1024_to_1522_byte_packets: 1
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 42
tx_65_to_127_byte_packets: 84
tx_128_to_255_byte_packets: 31
tx_256_to_511_byte_packets: 97
tx_512_to_1023_byte_packets: 0
tx_1024_to_1522_byte_packets: 0
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 0
tx_xoff_frames: 0
rx_mac_ctrl_frames: 0
rx_filtered_packets: 60
rx_discards: 0
rx_fw_discards: 0
-bash-3.2# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:1E:C9:AC:27:FB
inet addr:192.168.200.154 Bcast:192.168.203.255
Mask:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:574 errors:0 dropped:0 overruns:0 frame:0
TX packets:265 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:44422 (43.3 KiB) TX bytes:54606 (53.3 KiB)
Interrupt:16 Memory:f4000000-f4012100
>> I'm stumped and looking for causes and solutions. Yeah, the WRF as
>> compiled did run before the change to Jumbos.
>>
>> Do I reduce the size of the frames to something smaller, like 8800
>> bytes? 7500? 1500?
>
> In the past I had heard that jumbo frames may work on Broadcom NICs
> around 6000 byte length. We haven't tried this in a while ... YMMV.
>
>>
>> I'm not completely out of ideas but stumped.
>>
>> Thanks, gerry
>
>
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf
mailing list