[Beowulf] Problems with a JS21 - Ah, the networking...
Ivan Paganini
ispmarin at gmail.com
Mon Oct 1 07:06:45 PDT 2007
Hello Mark, Patrick,
>>The spawning phase in MPICH-MX uses socket and ssh (or rsh). Usually,
>>ssh uses native Ethernet, but it could also use IPoM (Ethernet over
>>Myrinet). Which case is it for you ?
As I said before, I'm also experiencing some ether problems (in the
service network) like TCP window full, lost segments, ack lost
segments, and trying to rule this out too.
I'm using the IPoM, as the manual says, because I configured each node with
ifconfig myri0 192.168.30.<number>
and associated this number on the /etc/hosts with a hostname, like
myriBlade<number>. I am also using ssh and polling method.
the mpirun.ch_mx -v with a hanged process is below:
___________________________________________
ivan at mamute:~/lib/mpich-mx-1.2.7-5-xl/examples> mpirun.ch_mx -v
--mx-label --mx-kill 30 -machinefile list -np 3 ./cpi
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi
Machines file is /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/list
Processes will be killed 30 after first exits.
mx receive mode used: polling.
3 processes will be spawned:
Process 0
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on mamute
Process 1
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on mamute
Process 2
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on
myriBlade109
Open a socket on mamute...
Got a first socket opened on port 55353.
ssh mamute "cd /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples &&
exec env MXMPI_MAGIC=3366365 MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=0 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi "
ssh mamute -n "cd /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples
&& exec env MXMPI_MAGIC=3366365 MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=1 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi "
ssh myriBlade109 -n "cd
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples && exec env
MXMPI_MAGIC=3366365 MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=2 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.30.209
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi "
All processes have been spawned
MPI Id 0 is using mx port 0, board 0 (MAC 0060dd47afe7).
MPI Id 2 is using mx port 0, board 0 (MAC 0060dd478aff).
MPI Id 1 is using mx port 1, board 0 (MAC 0060dd47afe7).
Received data from all 3 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Data sent to all processes.
___________________________________________
and hanged. The list file includes
mamute:2
myriBlade109:4
myriBlade108:4
where mamute is my headnode, so I can do all the traces.
>>Ivan may have to stage the binary on local disk prior to spawning, to
>>not rely on GPFS over Ethernet to serve it. Or even run GFPS over IPoM too.
GPFS over myri now is not an option. I compiled the executable
staticaly and tested it. Same problem. Now I staged the binary in the
scrath partition in each node, and the process hanged the same way:
__________________________________________
ivan at mamute:/home/ivan> mpirun.ch_mx -v --mx-label --mx-kill 30
-machinefile list -np 3 ./cpi
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
Program binary is: /home/ivan/./cpi
Machines file is /home/ivan/list
Processes will be killed 30 after first exits.
mx receive mode used: polling.
3 processes will be spawned:
Process 0 (/home/ivan/./cpi ) on mamute
Process 1 (/home/ivan/./cpi ) on mamute
Process 2 (/home/ivan/./cpi ) on myriBlade109
Open a socket on mamute...
Got a first socket opened on port 55684.
ssh mamute "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=0 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi "
ssh mamute -n "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=1 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi "
ssh myriBlade109 -n "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=2 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.30.209
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi "
All processes have been spawned
MPI Id 1 is using mx port 0, board 0 (MAC 0060dd47afe7).
MPI Id 2 is using mx port 0, board 0 (MAC 0060dd478aff).
MPI Id 0 is using mx port 1, board 0 (MAC 0060dd47afe7).
Received data from all 3 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Data sent to all processes.
__________________________________________
I notice, thought, that the spawing is _much_ faster than firing the
process from the GPFS partition.
This is the output of strace -f (lots of things here!):
________________________________________
[pid 7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid 7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid 7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid 7498] setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 7498] connect(4, {sa_family=AF_INET, sin_port=htons(55787),
sin_addr=inet_addr("192.168.15.1")}, 16) = 0
[pid 7498] write(1, "Sending mapping to MPI Id 1.\n", 29Sending
mapping to MPI Id 1.
) = 29
[pid 7498] send(4, "[[[<0:96:3712462823:0><1:96:3712"..., 72, 0) = 72
[pid 7498] close(4) = 0
[pid 7498] time([1191247146]) = 1191247146
[pid 7498] open("/etc/hosts", O_RDONLY) = 4
[pid 7498] fcntl64(4, F_GETFD) = 0
[pid 7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid 7498] fstat64(4, {st_mode=S_IFREG|0644, st_size=10247, ...}) = 0
[pid 7498] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40018000
[pid 7498] read(4, "#\n# hosts This file desc"..., 4096) = 4096
[pid 7498] read(4, "yriBlade077\n192.168.30.178 myri"..., 4096) = 4096
[pid 7498] read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
[pid 7498] read(4, "", 4096) = 0
[pid 7498] close(4) = 0
[pid 7498] munmap(0x40018000, 4096) = 0
[pid 7498] open("/etc/protocols", O_RDONLY) = 4
[pid 7498] fcntl64(4, F_GETFD) = 0
[pid 7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid 7498] fstat64(4, {st_mode=S_IFREG|0644, st_size=6561, ...}) = 0
[pid 7498] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40018000
[pid 7498] read(4, "#\n# protocols\tThis file describe"..., 4096) = 4096
[pid 7498] close(4) = 0
[pid 7498] munmap(0x40018000, 4096) = 0
[pid 7498] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid 7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid 7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid 7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid 7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid 7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid 7498] setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 7498] connect(4, {sa_family=AF_INET, sin_port=htons(45412),
sin_addr=inet_addr("192.168.30.209")}, 16) = 0
[pid 7498] write(1, "Sending mapping to MPI Id 2.\n", 29Sending
mapping to MPI Id 2.
) = 29
[pid 7498] send(4, "[[[<0:96:3712462823:0><1:96:3712"..., 69, 0) = 69
[pid 7498] close(4) = 0
[pid 7498] alarm(0) = 0
[pid 7498] write(1, "Data sent to all processes.\n", 28Data sent to
all processes.
) = 28
[pid 7498] accept(3, <unfinished ...>
[pid 7499] <... select resumed> ) = 1 (in [3])
[pid 7499] read(3,
"\302\317\32\275\357jD\230\222=\270N\341F\237\326@]\4\4"..., 8192) =
80
[pid 7499] select(7, [3 4], [6], NULL, NULL) = 1 (out [6])
[pid 7499] write(6, "0: Process 0 on mamute.lcca.usp."..., 350:
Process 0 on mamute.lcca.usp.br
) = 35
[pid 7499] select(7, [3 4], [], NULL, NULL
________________________________________
and hangs. This was with the binary _out_ of GPFS and statically compiled.
My ticket number is 53912, and Ruth and Scott are helping me.
Mark, ltrace does not accepts the mpirun.ch_mx as a valid elf
binary... it was compiled using the xlc compiler. Strange, because it
works with other system binaries (like ls...)
Thank you very much!!
Ivan
2007/10/1, Mark Hahn <hahn at mcmaster.ca>:
> > clone(child_stack=0,
> > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > child_tidptr=0x40046f68) = 31384
> > waitpid(-1,
>
> this looks like a fork/exec that's failing. as you might expect
> if, for instance, your shared FS doesn't supply a binary successfully.
> note also that ltrace -S often provides somewhat more intelligible
> diags for this kind of thing (since it might show what's actually
> being exec'ed.)
>
--
-----------------------------------------------------------
Ivan S. P. Marin
----------------------------------------------------------
More information about the Beowulf
mailing list