[tulip] Network Suspended

°­¿µ¼® electra@nownuri.net
Mon Sep 9 09:19:00 2002


Hello,

I'm not sure this way is the right way to post message.
I have a problem about network drop-out. I found similar symptoms in internet
but I haven't got any clear solution.

My cluster consists of 10 nodes of which spec is like this

Board    : ASUS-AV7266C
CPU       : AMD XP1900+
LAN        : ACCTON 100MBPS (Tulip chipset)
RAM       : 1.5GB for each nodes
VGA       :  Radeon Ve
OS         : Linux redhat 7.3 (kernel 2.4.x)
HUB       : HUB 3COM 24p 3C 16593 100M

with this cluster, I run parallel CFD program,CFX-TASCflow using PVM.
If the number of cell is not so big(network traffic is not so busy)
then there's no problem. but if network traffic is increased, network is
suddenly halted.

after system is halted, I could found such messages in log file.

Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6747893 / 6747893.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6753004 / 6753004.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6753761 / 6753761.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6753841 / 6753841.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6755194 / 6755194.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6755450 / 6755450.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6755706 / 6755706.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6755962 / 6755962.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6756218 / 6756218.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6756416 / 6756416.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6756928 / 6756928.
Aug 31 19:42:30 tp1 kernel: eth1: Restarted Rx at 6756986 / 6756986.


for similar symptoms, some solution is suggested.

I've recompiled the module for tulip card
after increasing of max_interrupt_work and RX_RING_SIZE.
but that doesn't solve my problem.

Any one have any suggestions?

****************************************************
Young seok kang
Seoul National Univ. Turbomachinery Lab.
URL http://turbo.snu.ac.kr
TEL 82 (2) 880-7118
****************************************************
 

 

  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
		   ÀÎ/ÅÍ/³Ý   ³ª/¿ì/´©/¸® 
		   http://www.nownuri.net 
  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=   

 



--MIME Multi-part separator--