[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster
   
    Donald Becker
     
    becker@scyld.com
       
    Sat Sep 14 10:37:01 2002
    
    
  
On Thu, 5 Sep 2002, Narisara Thongboonchoo wrote:
> I had  troubles w/  4 nodes Linux cluster system when run program w/ MPI and
> ssh command. However, I couldn't finnish my job since one of 4 nodes
> keep random died.
The same node, or different nodes?
  If it's the same node every time, you shouldn't be looking for a
  software fix.
> The job was killed since there's no route to that machine.  I'm not
> sure why it happended but found error messages about
> rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network
> communication cause this problem? If so, could you give me any
> suggestion?
If this isn't a memory problem, then it's a device driver problem.  No
user-level software should be able to cause this type of kernel error.
> Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
> [<e098e91a>] rtl8139_interrupt [8139too] 0xba
> [<c0109c7a>] handle_IRQ_event [kernel] 0x3a
> [<c0109df8>] do_IRQ [kernel] 0x68
> 
> Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
> <0> kernel panic: Aiee, killing interrupt handler!
-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993