[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster

Narisara Thongboonchoo nthongbo@cgrer.uiowa.edu
Thu Sep 5 16:30:00 2002


Dear realtex list user,

I had  troubles w/  4 nodes Linux cluster system when run program w/ MPI and
ssh command. However, I couldn't finnish my job since one of 4 nodes keep random
died. The job was killed since there's no route to that machine.
I'm not sure why it happended but found error messages about
rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network communication
cause this problem? If so, could you give me any suggestion?

Regards,
Narisara

My system use Redhat 7.3 and P4 1.6 GHz. 4 Nodes are  Soyo P4VDA motherboard w/
Realtek 8139 LAN onboard & VIA P4X266A chip set, and 512 MB of DDR.
A Master is Soyo P4S Dragon Ultra MB w/ sis 900/7016 LAN on Board & Sis 645 chipset
and 1.5 GB of DDR ram. Network switch is Netgear Fast Ethernet FS 105.

--------------------------------------------------------------------------------
*pde = 00000000
Oops: 0000
CPU:    0
binfmt_misc nfsd autofs nfs lockd sunrpc 8139to mii ide-scsi_mod ide-cd
EIP:    0010:[<c0109db4>]    Not tainted
EFLAGS: 00010002
eax: 00000000   ebx: 00000160   ecx: 064d600b   edx: 00000018
esi: 0000000b   edi: c0322a60   ebp: 00000000   esp: de83defc
ds: 0018   es: 0018   ss: 0018
Process mm5.mpp (pid: 1139 stackpage=de83d000)
Stack: 0000000b dfd14560 064d6008 00000003 e0993000 c021e2a4 dfd14560 064d600b
       001605ea 064d6008 00000003 e0993000 00000000 de830018 ded10018 ffffff0b
       e098e308 00000010 00000246 00000004 e0993000 dfd14400 dfd14560 e098e91a
Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
[<e098e91a>] rtl8139_interrupt [8139too] 0xba
[<c0109c7a>] handle_IRQ_event [kernel] 0x3a
[<c0109df8>] do_IRQ [kernel] 0x68

Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
<0> kernel panic: Aiee, killing interrupt handler!
In interrupt handler -not syncing
________________________________________________________________________________