[Beowulf] tg3 driver and rx dropped packets

Michael Di Domenico mdidomenico4 at gmail.com
Fri Dec 18 05:40:47 PST 2009

Perhaps my brain is already checking out for the holidays, perhaps
someone might be able to shed some light...

I have several Dell t3500 workstations which we've installed PCI-X
BroadCom cards specifically the BCM5703 Fiber cards.

We're also using RedHat v5.4 2.6.18-164.6.1 kernel with all the stock drivers.

For some reason which i cannot determine, when we read a large amount
of data into the workstation we see the RX dropped counter steadily
(rapidly) increase, eventually locking the TCP transmissions, which
results in an aborted file operation.

This only happens on reads, if we do a write operation we do not see TX drops.

We've tested this between all types of devices (nfs, http) servers at
various points in the network and the only common thing is the NIC.

I'd think it was just a bad nic, but we have several nics across t3500
and t3400's doing this.

Is anyone aware of such an issue?  Can anyone recommend some steps i
can take to isolate why the packets are being dropped?

I hooked up wireshark on one of the servers while we were running the
test and i see a lot of Duplicate ACK and TCP Checksum errors, in the
communications between the two hosts.  But im not sure that actually
points to anything.


