[eepro100] wait_for_cmd_done timeout
John Madden
jmadden@ivytech.edu
Wed Aug 21 16:53:09 2002
I've seen this message a lot in the archives, and after experiencing it on
and off for about two years now, no one has come up with an actual
solution. Bad driver? Bad hardware? Which is it?
Info: Using the eepro100 driver from Linux-2.4.14 (Dell 2450, SMP). A
kernel upgrade would be feasable if more recent versions fix the issue (I
didn't find anything in the changelogs though). This is a production
machine - kernel modules are not an option. I thought we had remedied the
situation by replacing the cards because the machine was up for 244 days
before this happened, and it's just happened now, only 14 days after the
most recent reboot. Rebooting seems to be the only way of fixing the
issue, too. Ugh.
Output of eepro100-diag -ee:
Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xece0.
EEPROM contents, size 64x16:
00: a000 1ec9 4b88 0000 0000 0101 4401 0000
0x08: 3525 0903 0000 0000 0000 0000 0000 0000
...
0x38: 0000 0000 0000 0000 0000 0000 0000 2d3f
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:A0:C9:1E:88:4B.
Receiver lock-up bug exists. (The driver work-around *is* implemented.)
Board assembly 352509-003, Physical connectors present: RJ45
Primary interface chip DP83840 PHY #1.
Transceiver-specific setup is required for the DP83840 transceiver.
Index #2: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xecc0.
EEPROM contents, size 64x16:
00: a000 13c9 6268 0000 0000 0101 4401 0000
0x08: 3525 0903 0000 0000 0000 0000 0000 0000
...
0x38: 0000 0000 0000 0000 0000 0000 0000 215f
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:A0:C9:13:68:62.
Receiver lock-up bug exists. (The driver work-around *is* implemented.)
Board assembly 352509-003, Physical connectors present: RJ45
Primary interface chip DP83840 PHY #1.
Transceiver-specific setup is required for the DP83840 transceiver.
When the wait_for_cmd_done issue takes hold, it can be stopped by downing
the interfaces, but bringing them back up brings the problem back. I did
notice something odd today though, just after this started happening:
eepro100-diag -ee reported the same as above, with an additional line:
"Command register has unprocessed command 0020(?!)"
What does that mean?
Please let me know if I can provide more information. This is our main
web server, so it looks *really* bad when this happens. I'd like to see
this issue resolved if possible.
Thanks,
John
--
John Madden
UNIX Systems Engineer
Ivy Tech State College
jmadden@ivytech.edu