weird network DoS LX164, Tulip, RedHat

Alexander L. Belikoff
Thu Dec 10 06:39:01 1998

Hello everybody,

We have a couple of Alphas (LX164), running RedHat Linux 4.2. The
hardware configuration is rather simple: 

* LX164 motherboard, 128Mb of RAM, 128Mb of swap space

* SRM console

* headless machine (no kbd/mouse/videocard) - serial console

* SCSI controller: Intraserver NCR 53c8xx based. Detected as:

kernel: ncr53c8xx: at PCI bus 0, device 6, function 0
kernel: ncr53c8xx: PCI_LATENCY_TIMER=0, bursting should'nt be allowed.
kernel: ncr53c8xx: PCI_CACHE_LINE_SIZE not set, features based on CACHE LINE SIZE not used.
kernel: ncr53c8xx: 53c875 detected
kernel: ncr53c875-0: rev=0x04, base=0x9001000, io_port=0x9000, irq=16
kernel: ncr53c875-0: NCR clock is 40218KHz, 40218KHz
kernel: ncr53c875-0: ID 7, Fast-20, Parity Checking
kernel: ncr53c875-0: on-chip RAM at 0x9002000
kernel: ncr53c875-0: restart (scsi reset).
kernel: ncr53c875-0: copying script fragments into the on-chip RAM ...
kernel: scsi0 : ncr53c8xx - revision 2.6n

* SCSI hard drive:

  Vendor: SEAGATE   Model: ST19101W          Rev: 0014
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi : detected 1 SCSI disk total.
ncr53c875-0-<0,0>: FAST-5 WIDE SCSI 10.0 MB/s (200 ns, offset 15)
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
ncr53c875-0-<0,0>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)

(we force fast SCSI by doing 'echo "setsync 0 12" > /proc/scsi/ncr53c8xx/0')

* DEC DS21140 Tulip network card running in 100Mbps full duplex:

tulip.c:v0.90 10/20/98
eth0: Digital DS21140 Tulip at 0x8800, 00 c0 f0 31 ab 02, IRQ 19.
eth0:  EEPROM default media type Autosense.
eth0:  Index #0 - Media MII (#11) described by a 21140 MII PHY (1) block.
eth0:  MII transceiver #1 config 3000 status 7829 advertising 01e1.
  PCI latency timer (CFLT) is unreasonably low at 0.  Setting to 64 clocks.
eth0:  Advertising 01e1 on PHY 0 (1).
eth0: The transmitter stopped!  CSR5 is fc678006, CSR6 320e2202.
eth0: Setting full-duplex based on MII Xcvr #1 parter capability of 41e1.

The machines run the kernel 2.0.30 with patches from Redhat as well as
a serial console patch.

The problem we have is some random network outages occuring with these
machines - sometimes one or another of them just ceases any network
activity (including responding to pings). The machine remains
functional in that it allows login from a console and the problem may
be remedied by bouncing the network interface.

One indication which often shows up (not always) is a bunch of the
following messages:

kernel: Couldn't get a free page.....
kernel: eth0: Memory squeeze, deferring packet.
last message repeated 13 times
kernel: eth0: Too much work at interrupt, csr5=0xfc6980c0.

We've made an attempt to overcome the problem by using a new kernel,
namely axp_linux-2.0.34 (a patched 2.0.34) from The
latter had a broken Tulip driver, so I've upgraded it to tulip 0.90.
However, this didn't quite help. We had a very same outage the very
next day, yet it didn't display messages above. What it did show were
a couple of alignment traps:

Couldn't get a free page.....
kernel: unaligned trap at fffffc0000364d54: fffffc00078b0046 28 2
kernel: unaligned trap at fffffc0000364e10: fffffc00078b0046 28 1
kernel: unaligned trap at fffffc0000364eb0: fffffc00078b0056 28 2
kernel: unaligned trap at fffffc0000364eb8: fffffc00078b0056 28 3
kernel: unaligned trap at fffffc000037d154: fffffc0000200056 28 16
kernel: unaligned trap at fffffc000037d154: fffffc00078b2056 28 16
kernel: unaligned trap at fffffc000037d154: fffffc0007fd583e 28 16

According to system map, the latter occures somewhere in '',
whatever this routine may be...

If anybody has or had similar problems, please help.

Thanks in advance,

Alexander L. Belikoff
Bloomberg L.P. / BFM Financial Research Ltd.,