[Beowulf] Intel 82574L problems with newer kernels?

Bill Broadley bill at cse.ucdavis.edu
Tue Dec 11 18:21:05 PST 2012


Anyone have some working tweaks to get an Intel E1000e driver + 82574L
chip to behave with linux 3.5 or 3.7 kernels?  Not sure if this is a
problem for all 82574Ls or just ones on recent supermicro motherboards.

I noticed stuttering, occasional high latencies, and a continuously
increasing dropped packets from ifconfig:
  RX packets:13437889 errors:0 dropped:14185 overruns:0 frame:0

Even something simple like ping -c 100 would show at least one packet
with over 1 second latencies.

Several discussions mention that some of the errors are not logged, so
it's may well be significantly worse than you'd think from the dropped
packet count.

Replacing the cables, switch, or even the entire node doesn't seem to
make any difference.   I've found quite a few discussions about it,
googling "linux 82574L dropped" finds quite a few.  Most that I found
that provide details mention supermicro motherboards.

There seems to be a solution for Centos 6, but I'm having problems
getting said fix to work with newer kernels.

Some of the discussions:

http://www.linuxquestions.org/questions/linux-hardware-18/intel-82574l-gigabit-network-card-issues-and-resolution-831364/

http://www.doxer.org/learn-linux/resolved-intel-e1000e-driver-bug-on-82574l-ethernet-controller-causing-network-blipping/
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1018561

http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574_or_82583/

This is mostly a concern for me because it's a noticeable performance
problem and supermicro based resellers seem to be winning the cluster
bids recently.  With newer Sandy Bridge, Ivy Bridge, Bulldozer, and
Piledriver CPUs it seems worthwhile to run a relatively new kernel.

Has anyone been successful with getting the 82574L to work as expected?
 With a supermicro motherboard?

I've tried all the discussed fixes including but not limited to updating
the driver, upgrading to from a 3.5 kernel -> 3.7 kernel,
turning pcie_aspm off, various e1000e.IntMode settings,
e1000e.interruptthrottleRate, apci=off, disabling various features with
ethtool, and patching the e1000e firmware.

For such a popular chip and driver I'm surprised that problems seem to
be lingering.  Then again I suspect most people are happy when a network
provides connectivity and not so much about performance.  Thus my email
to the beowulf list.










More information about the Beowulf mailing list