[eepro100] Quad port Compaq NC3134/35 i82559 = IRQ 23 is physically
blocked
Donald Becker
becker@scyld.com
Wed Feb 20 18:41:01 2002
On Wed, 20 Feb 2002, Claude LeFrancois (LMC) wrote:
> I try to install/configure a quad port Compaq NC3134 equipped with the
> NC3135 module into a server system. The NC3134 is a dual board, NC3135
> is a module installed on top of the NC3134 which provides 2 extra 10/100
> ports for a total of 4 ports. The board is a PCI 64 bit card. All the
> four ports are i82559 chipsets (eepro100).
If I'm thinking of the same board, the primary board contains a 21152 bus
bridge. The daughterboard has only the two '559 chips on the PCI bus.
> ... This
> system is also equipped with dual on-board i82559. It makes a total of 6
> i82559. The server runs RedHat 6.2 over a 2.2.17 kernel.
>
> The problem resides in the fact that 2 NICs are not working well. I got
> this message:
>
> eth0: IRQ 23 is physically blocked! Failing back to low-rate
> polling.
>
> It looks like an IRQ/IOAPIC problem. The faulty ports (both module ports
> on NC3135) are sharing IRQs with their parents (main ports on NC3134):
As you guessed, this indicates an IRQ mapping problem. And the APIC
table is usually to blame.
The quick work-around -- and one that Scyld always ships by default for
2.2 kernel -- is to use the "noapic" kernel option. This results in
unbalanced interrupts, but this can actually be good in some SMP
environments.
It is possible that the IRQ isn't really blocked, just that there is a
race condition where the other CPU is currently handling the interrupt.
You can check this by starting up only eth0 and checking the interrupt
count. But I'm guessing from the low interrupt count that we really do
have a problem here.
> 22: 4 3 IO-APIC-level eth1, eth3
> 23: 4 4 IO-APIC-level eth0, eth2
...
> 28: 277 516 IO-APIC-level eth5
> 31: 279 92 IO-APIC-level eth4
Yup, not many interrupts are getting through. Does the count ever go
up?
It is curious that there are two IRQ assigned (I'm guessing INTA and
INTB pins) rather than one or four.
> The board finally works but give a really slow rate:
>
> [root@lmcx2 /root]# ping 192.166.0.1
> PING 192.166.0.1 (192.166.0.1) from 192.166.50.1 : 56(84) bytes of
> data.
> eth0: IRQ 23 is physically blocked! Failing back to low-rate
> polling.
> 64 bytes from 192.166.0.1: icmp_seq=0 ttl=255 time=13.367 sec
> 64 bytes from 192.166.0.1: icmp_seq=1 ttl=255 time=12.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=2 ttl=255 time=11.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=3 ttl=255 time=10.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=4 ttl=255 time=9.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=5 ttl=255 time=8.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=6 ttl=255 time=7.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=7 ttl=255 time=6.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=8 ttl=255 time=5.370 sec
> 64 bytes from 192.166.0.1: icmp_seq=9 ttl=255 time=4.370 sec
This is exactly what is expected when the interrupt isn't getting
through. The driver eventually decides to give up and processes all of
the packets in the Rx ring.
The low-rate polling isn't intended to work well. Instead it's a
fall-back so that you can ssh to the server and figure out what is
broken. To do high-throughput polling the driver would need many more
Rx buffers and access to 1000+ Hz polling rather than the kernel's
standard 100Hz timer ticks.
> IO APIC #5......
...
> IRQ22 -> 6
> IRQ23 -> 7
> IRQ26 -> 10
> IRQ27 -> 11
> IRQ28 -> 12
> IRQ30 -> 14
> IRQ31 -> 15
...
> PCI->APIC IRQ transform: (B0,I4,P0) -> 28
> PCI->APIC IRQ transform: (B0,I5,P0) -> 26
> PCI->APIC IRQ transform: (B0,I5,P1) -> 27
> PCI->APIC IRQ transform: (B0,I6,P0) -> 31
> PCI->APIC IRQ transform: (B0,I15,P0) -> 10
> PCI->APIC IRQ transform: (B1,I0,P0) -> 30
> PCI->APIC IRQ transform: (B3,I4,P0) -> 22
> PCI->APIC IRQ transform: (B3,I5,P0) -> 23
> PCI->APIC IRQ transform: (B3,I6,P0) -> 22
> PCI->APIC IRQ transform: (B3,I7,P0) -> 23
...
> eepro100.c:v1.19 12/19/2001 Donald Becker <mailto:becker@scyld.com>
> <becker@scyld.com>
> http://www.scyld.com/network/eepro100.html
> <http://www.scyld.com/network/eepro100.html>
> eth0: OEM Intel i82559 rev 8 at 0xe0843000, 00:02:A5:DA:80:75, IRQ 23.
> eth1: OEM Intel i82559 rev 8 at 0xe0845000, 00:02:A5:DA:80:74, IRQ 22.
These are the problem interfaces on the daughtercard, correct?
(I expected the daughtercard interfaces to be eth2 & 3.)
> eth2: OEM Intel i82559 rev 8 at 0xe0847000, 00:02:A5:D6:4A:C3, IRQ 23.
> eth3: OEM Intel i82559 rev 8 at 0xe0849000, 00:02:A5:D6:4A:C2, IRQ 22.
And these are on the base PCI card and work fine.
> eth4: OEM Intel i82559 rev 8 at 0xe084b000, 00:30:48:11:FE:68, IRQ 31.
> eth5: OEM Intel i82559 rev 8 at 0xe084d000, 00:30:48:11:F7:62, IRQ 28.
And these are on the motherboard. (On-motherboard devices are always
last, designed so that a plug-in card overrides a potentially broken
on-board device.)
Donald Becker becker@scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993