[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster

Donald Becker becker@scyld.com
Sat Sep 14 10:37:01 2002


On Thu, 5 Sep 2002, Narisara Thongboonchoo wrote:

> I had  troubles w/  4 nodes Linux cluster system when run program w/ MPI and
> ssh command. However, I couldn't finnish my job since one of 4 nodes
> keep random died.

The same node, or different nodes?
  If it's the same node every time, you shouldn't be looking for a
  software fix.

> The job was killed since there's no route to that machine.  I'm not
> sure why it happended but found error messages about
> rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network
> communication cause this problem? If so, could you give me any
> suggestion?

If this isn't a memory problem, then it's a device driver problem.  No
user-level software should be able to cause this type of kernel error.

> Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
> [<e098e91a>] rtl8139_interrupt [8139too] 0xba
> [<c0109c7a>] handle_IRQ_event [kernel] 0x3a
> [<c0109df8>] do_IRQ [kernel] 0x68
> 
> Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
> <0> kernel panic: Aiee, killing interrupt handler!

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993