Troubles with Adaptec DuraLAN, SMP boxes, channel bonding

Ward Fenton ward@zurg.amazingmedia.com
Tue Dec 14 03:52:14 1999


I've been struggling with the starfire driver and channel bonding on about
a dozen intel dual processor servers. Each duralan card is two ports, our
switch is an Extreme Networks Summit 48.

I'm continuously recieving errors during periods of intense network
activity. I recieved the errors below from running "ping -f target"
from two simultaneous local boxes. At the time the "something wicked"
error occurs, the network link is momentarily frozen then resumes after a
few seconds. I also have run netperf, ftp, scp with similar outcomes.
With netperf I've measured 192mbits/sec UDP bandwidth over my link.
TCP tests do not behave well and trigger my problems immediately. I've
spent days searching for clues and tricks to fix this problem
and now believe that I'm either going to have to run boot with the noapic
option or migrate to different hardware. It seems that a multiport or
multiple card tulip based solution is the way to go assuming 21143
based cards are available and that the recent driver capability of
interrupt mitigation in hardware.

So far in my testing I'm only seeing these overruns with the starfire
driver. I've done some minor testing with 21140 based SMC etherpower cards 
without seeing the same problems.

One other small point of concern is that I thought that I'd come across a
message regarding the kern-2.3 network drivers at cesdis.gsfc.nasa.gov
which stated that the starfire driver and others hadn't received some
of the most recent updates which were already applied to the more common
drivers.

by the way... i can send out some of my company's t-shirts out to
any people who can help get this thing moving.

Thanks in advance,
Ward


$ uname -a
Linux xxxxx 2.2.13ac3 #1 SMP Mon Dec 13 17:51:02 EST 1999 i686
unknown

$ cat /proc/interrupts 
           CPU0       CPU1       
  0:    1420115    1422245    IO-APIC-edge  timer
  1:         29         26    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 12:          1          0    IO-APIC-edge  PS/2 Mouse
 13:          1          0          XT-PIC  fpu
 19:      10983      10942   IO-APIC-level  aic7xxx, aic7xxx
 20:     116547     114677   IO-APIC-level  eth0
 21:     115460     116831   IO-APIC-level  eth1
NMI:          0
ERR:          0

$ cat /proc/net/dev
Inter-|   Receive                                                |
 face |bytes    packets errs drop fifo frame compressed multicast|                             
    lo:     200       4    0    0    0     0          0         0                                                           
 bond0:16147680  163176    0    0    3     0          0         0                                                           
  eth0: 7477264   80485    0    0    1     0          0         0                                                           
  eth1: 8670416   82691    0    0    2     0          0         0                                                           

      |  Transmit
      |  bytes    packets errs drop fifo colls carrier compressed
           200       4       0    0    0     0       0          0
       1332302098 3798142    0    0    3     0       0          0
       2813605078 1899071    0    0    1     0       0          0
       2813664316 1899071    0    0    2     0       0          0

$ ifconfig
bond0  Link encap:Ethernet  HWaddr 00:00:D1:DA:C6:33  
       inet addr:208.51.95.74  Bcast:208.51.95.127 Mask:255.255.255.192
       UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
       RX packets:161089 errors:0 dropped:0 overruns:3 frame:0
       TX packets:3795172 errors:0 dropped:0 overruns:3 carrier:0
       collisions:0 txqueuelen:0 

eth0   Link encap:Ethernet  HWaddr 00:00:D1:DA:C6:33  
       inet addr:208.51.95.74  Bcast:208.51.95.127 Mask:255.255.255.192
       UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
       RX packets:79532 errors:0 dropped:0 overruns:1 frame:0
       TX packets:1897586 errors:0 dropped:0 overruns:1 carrier:0
       collisions:0 txqueuelen:100 
       Interrupt:20 Base address:0xa000 

eth1   Link encap:Ethernet  HWaddr 00:00:D1:DA:C6:33  
       inet addr:208.51.95.74  Bcast:208.51.95.127 Mask:255.255.255.192
       UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
       RX packets:81557 errors:0 dropped:0 overruns:2 frame:0
       TX packets:1897586 errors:0 dropped:0 overruns:2 carrier:0
       collisions:0 txqueuelen:100 
       Interrupt:21 Base address:0xb000 


from dmesg:
starfire.c:v0.13 8/21/99  Written by Donald Becker
 Undates and info at http://www.beowulf.org/linux/drivers.html
eth0: Adaptec Starfire 6915 at 0x9005a000, 00:00:d1:da:c6:33, IRQ 20.
eth0: MII PHY found at address 1, status 0x782d advertising 01e1.
eth1: Adaptec Starfire 6915 at 0x900db000, 00:00:d1:da:c6:34, IRQ 21.
eth1: MII PHY found at address 1, status 0x782d advertising 01e1.
eth0: Setting full-duplex based on MII #1 link partner capability of 41e1.
eth1: Setting full-duplex based on MII #1 link partner capability of 41e1.
eth0: Something Wicked happened! 2048101.
eth0: Something Wicked happened! 2048101.
eth1: Something Wicked happened! 2048101.
eth0: Something Wicked happened! 2048101.
eth0: Link changed: Autonegotiation advertising 01e1  partner 41e1.
eth0: Something Wicked happened! ffffffff.
eth1: Link changed: Autonegotiation advertising 01e1  partner 41e1.
eth1: Something Wicked happened! ffffffff.
eth1: Link changed: Autonegotiation advertising 01e1  partner 41e1.
eth1: Something Wicked happened! ffffffff.