[Beowulf] Problems with a JS21 - Ah, the networking...

Ivan Paganini ispmarin at gmail.com
Mon Oct 1 05:34:46 PDT 2007


Hello Chris, everybody:

I am not using jumbo frames, and I'm now considering this option, but
first I wanted to know for sure that there is no other problem before,
just to control the number of variables at hand. But thanks for your
help.

I did a strace on the hanged process, and the output is this:
______________________________________________

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x401
76000
read(4, "#\n# hosts         This file desc"..., 4096) = 4096
read(4, "yriBlade077\n192.168.30.178  myri"..., 4096) = 4096
read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0x40176000, 4096)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25994
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25995
brk(0x102ab000)                         = 0x102ab000
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25996
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1,

______________________________________________
and just that. I'm now trying to make a better undestanding that what
is happening.

Thank you.

Ivan


2007/9/29, Chris Samuel <csamuel at vpac.org>:
> On Sat, 29 Sep 2007, Ivan Paganini wrote:
>
> > I sniffed the network in the store nodes interface, and i got lots
> > of TCP lost fragment, previos lost fragments, ack lost fragments
> > and TCP window size full.
>
> Some suggestions would be to check that all network interfaces are
> negotiating gigabit back to the switch, and that if you are using
> jumbo frames then all interfaces are indeed using jumbo frames.
>
> A useful check to verify 2 way jumbo frames connectivity is by using
> the ping command, doing:
>
> ping -c 1 -M do -s 8900 $hostname
>
> should tell you whether or not it is working.
>
> Best of luck!
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager
>  The Victorian Partnership for Advanced Computing
>  P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
-----------------------------------------------------------
Ivan S. P. Marin
----------------------------------------------------------



More information about the Beowulf mailing list