[eepro100] Dell 4400 instability with eepro100 driver...

Henrik Schmiediche Henrik Schmiediche" <henrik@stat.tamu.edu
Sat Feb 16 23:24:00 2002


      Hello,
I have a single processor Dell 4400 server with 4GB of RAM that I cannot get
to run stable under high network loads (NFS, remote backups). I am about
ready to trash this system and go back to a Sun. I am running RH 7.2 with
2.4.9-13 and I have used the stock eepro100 drivers that come with RH, the
latest Intel 1.6.29 drivers and the latest eepro100 drivers and all of them
lock up. I also get lockups (WATCHDOG/timeout) when I install a 3com 3c905C
card (though I have not tried the latest drivers for this card from the
scyld website). I have also tried changing to an external eepro100 card
(instead of using the buildin one) with no success. When I installed the
latest eepro100 drivers I get this NMI message which may be related to the
lockups, but I am not sure... I have tried changing RAM with no success.

Feb 16 07:58:38 s0 kernel: eepro100.c:v1.20 1/28/2002 Donald Becker
<becker@scyld.com>
Feb 16 07:58:38 s0 kernel:   http://www.scyld.com/network/eepro100.html
Feb 16 07:58:38 s0 kernel: Uhhuh. NMI received. Dazed and confused, but
trying to continue
Feb 16 07:58:38 s0 kernel: You probably have a hardware problem with your
RAM chips
Feb 16 07:58:38 s0 kernel: Uhhuh. NMI received. Dazed and confused, but
trying to continue
Feb 16 07:58:38 s0 kernel: You probably have a hardware problem with your
RAM chips
Feb 16 07:58:38 s0 kernel: Uhhuh. NMI received for unknown reason 25.
Feb 16 07:58:38 s0 kernel: Dazed and confused, but trying to continue
Feb 16 07:58:38 s0 kernel: Do you have a strange power saving mode enabled?
Feb 16 07:58:38 s0 kernel: eth0: Intel i82559 rev 8 at 0xf899f000,
00:B0:D0:20:87:60, IRQ 14.
Feb 16 07:58:38 s0 kernel:   Board assembly 07195d-000, Physical connectors
present: RJ45
Feb 16 07:58:38 s0 kernel:   Primary interface chip i82555 PHY #1.
Feb 16 07:58:38 s0 kernel:   General self-test: passed.
Feb 16 07:58:38 s0 kernel:   Serial sub-system self-test: passed.
Feb 16 07:58:38 s0 kernel:   Internal registers self-test: passed.
Feb 16 07:58:38 s0 kernel:   ROM checksum self-test: passed (0x04f4518b).
Feb 16 07:58:38 s0 kernel:   Receiver lock-up workaround activated.

The error message I get (a whole lot of them):

Feb 15 23:35:22 s0 kernel: Command 0080 was not immediately accepted, 10001
ticks!
Feb 15 23:35:54 s0 last message repeated 19 times
Feb 15 23:36:00 s0 last message repeated 3 times
Feb 15 23:36:04 s0 kernel: eth0: Transmit timed out: status 0090  0080 at
25279986/25280017 commands 000ca000 000c0000 000c0000.
Feb 15 23:36:04 s0 kernel: Command 0080 was not immediately accepted, 10001
ticks!
Feb 15 23:36:04 s0 kernel: eth0: Restarting the chip...
Feb 15 23:36:04 s0 kernel: Command 0070 was not accepted after 10001 polls!
Feb 15 23:36:08 s0 kernel: eth0: Transmit timed out: status 0000  0010 at
25279986/25280018 commands 000ca000 000c0000 000c0000.
Feb 15 23:36:08 s0 kernel: eth0: Restarting the chip...

A few additional comments:

   - I cannot recover from this except with a reboot. At least I do not know
how.
   - The eepro100  card shares an interrupt with the SCSI controller. Is
there a way to reassign the IRQ of the eepro100 card?
   - The system is even more unstable when I install a second CPU.

 Any ideas on what to try? Bad motherboard?

Sincerely,

      -  Henrik

          CPU0
  0:    5115449          XT-PIC  timer
  1:       1875          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  5:         30          XT-PIC  aic7xxx
  8:          1          XT-PIC  rtc
 10:   36396202          XT-PIC  aic7xxx
 11:          0          XT-PIC  usb-ohci
 12:       3151          XT-PIC  PS/2 Mouse
 14:    6638596          XT-PIC  aic7xxx, eth0
NMI:          3
ERR:          0

PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: ServerWorks CNB20LE Host Bridge (rev 5).
      Master Capable.  Latency=48.
  Bus  0, device   0, function  1:
    Host bridge: ServerWorks CNB20LE Host Bridge (#2) (rev 5).
      Master Capable.  Latency=48.
  Bus  0, device  17, function  0:
    Host bridge: ServerWorks CNB20LE Host Bridge (#3) (rev 5).
      Master Capable.  Latency=48.
  Bus  0, device  17, function  1:
    Host bridge: ServerWorks CNB20LE Host Bridge (#4) (rev 5).
      Master Capable.  Latency=48.
  Bus  0, device   4, function  0:
    Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 8).
      IRQ 14.
      Master Capable.  Latency=32.  Min Gnt=8.Max Lat=56.
      Non-prefetchable 32 bit memory at 0xfeb02000 [0xfeb02fff].
      I/O at 0xfcc0 [0xfcff].
      Non-prefetchable 32 bit memory at 0xfe900000 [0xfe9fffff].
  Bus  0, device   6, function  0:
    VGA compatible controller: ATI Technologies Inc 3D Rage IIC (rev 122).
      Master Capable.  Latency=32.  Min Gnt=8.
      Prefetchable 32 bit memory at 0xfd000000 [0xfdffffff].
      I/O at 0xf800 [0xf8ff].
      Non-prefetchable 32 bit memory at 0xfeb01000 [0xfeb01fff].
  Bus  0, device  15, function  0:
    ISA bridge: ServerWorks OSB4 South Bridge (rev 79).
  Bus  0, device  15, function  2:
    USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 4).
      IRQ 11.
      Master Capable.  Latency=32.  Max Lat=80.
      Non-prefetchable 32 bit memory at 0xfeb00000 [0xfeb00fff].
  Bus  6, device   4, function  0:
    PCI bridge: PCI device 8086:0962 (Intel Corporation) (rev 1).
      Master Capable.  Latency=32.  Min Gnt=6.
  Bus  7, device   4, function  0:
    SCSI storage controller: Adaptec 7899P (rev 1).
      IRQ 10.
      Master Capable.  Latency=32.  Min Gnt=40.Max Lat=25.
      I/O at 0xcc00 [0xccff].
      Non-prefetchable 64 bit memory at 0xfacff000 [0xfacfffff].
  Bus  7, device   4, function  1:
    SCSI storage controller: Adaptec 7899P (#2) (rev 1).
      IRQ 5.
      Master Capable.  Latency=32.  Min Gnt=40.Max Lat=25.
      I/O at 0xc800 [0xc8ff].
      Non-prefetchable 64 bit memory at 0xfacfe000 [0xfacfefff].
  Bus  7, device   6, function  0:
    SCSI storage controller: Adaptec AIC-7880U (rev 2).
      IRQ 14.
      Master Capable.  Latency=32.  Min Gnt=8.Max Lat=8.
      I/O at 0xc400 [0xc4ff].
      Non-prefetchable 32 bit memory at 0xfacfd000 [0xfacfdfff].

[root@s0:/var/log]# mii-diag
Using the default interface 'eth0'.
Basic registers of MII PHY #1:  3000 782d 02a8 0154 05e1 41e1 0003 0000.
 The autonegotiated capability is 01e0.
The autonegotiated media type is 100baseTx-FD.
 Basic mode control register 0x3000: Auto-negotiation enabled.
 You have link beat, and everything is working OK.
 Your link partner advertised 41e1: 100baseTx-FD 100baseTx 10baseT-FD
10baseT.
   End of basic transceiver information.