[Beowulf] Problems with a JS21 - Ah, the networking...

Ivan Paganini ispmarin at gmail.com
Mon Oct 1 07:06:45 PDT 2007


Hello Mark, Patrick,

>>The spawning phase in MPICH-MX uses socket and ssh (or rsh). Usually,
>>ssh uses native Ethernet, but it could also use IPoM (Ethernet over
>>Myrinet). Which case is it for you ?

As I said before, I'm also experiencing some ether problems (in the
service network) like TCP window full, lost segments, ack lost
segments, and trying to rule this out too.

I'm using the IPoM, as the manual says, because I configured each node with

ifconfig myri0 192.168.30.<number>

and associated this number on the /etc/hosts with a hostname, like
myriBlade<number>.  I am also using ssh and polling method.

the mpirun.ch_mx -v with a hanged process is below:
___________________________________________
ivan at mamute:~/lib/mpich-mx-1.2.7-5-xl/examples> mpirun.ch_mx -v
--mx-label --mx-kill 30 -machinefile list -np 3 ./cpi
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi
Machines file is /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/list
Processes will be killed 30 after first exits.
mx receive mode used: polling.
3 processes will be spawned:
        Process 0
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on mamute
        Process 1
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on mamute
        Process 2
(/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi ) on
myriBlade109
Open a socket on mamute...
Got a first socket opened on port 55353.
ssh  mamute  "cd /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples &&
exec env MXMPI_MAGIC=3366365  MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=0 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi    "
ssh  mamute -n "cd /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples
&& exec env MXMPI_MAGIC=3366365  MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=1 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi    "
ssh  myriBlade109 -n "cd
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples && exec env
MXMPI_MAGIC=3366365  MXMPI_MASTER=mamute MXMPI_PORT=55353
MX_DISABLE_SHMEM=0 MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1
LD_LIBRARY_PATH=/usr/lib:/usr/lib64 MXMPI_ID=2 MXMPI_NP=3
MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.30.209
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/examples/./cpi    "
All processes have been spawned
MPI Id 0 is using mx port 0, board 0 (MAC 0060dd47afe7).
MPI Id 2 is using mx port 0, board 0 (MAC 0060dd478aff).
MPI Id 1 is using mx port 1, board 0 (MAC 0060dd47afe7).
Received data from all 3 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Data sent to all processes.
___________________________________________

and hanged. The list file includes

mamute:2
myriBlade109:4
myriBlade108:4

where mamute is my headnode, so I can do all the traces.


>>Ivan may have to stage the binary on local disk prior to spawning, to
>>not rely on GPFS over Ethernet to serve it. Or even run GFPS over IPoM too.

GPFS over myri now is not an option. I compiled the executable
staticaly and tested it. Same problem. Now I staged the binary in the
scrath partition in each node, and the process hanged the same way:

__________________________________________
ivan at mamute:/home/ivan> mpirun.ch_mx -v --mx-label --mx-kill 30
-machinefile list -np 3 ./cpi
Program binary is: /mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
Program binary is: /home/ivan/./cpi
Machines file is /home/ivan/list
Processes will be killed 30 after first exits.
mx receive mode used: polling.
3 processes will be spawned:
        Process 0 (/home/ivan/./cpi ) on mamute
        Process 1 (/home/ivan/./cpi ) on mamute
        Process 2 (/home/ivan/./cpi ) on myriBlade109
Open a socket on mamute...
Got a first socket opened on port 55684.
ssh  mamute  "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=0 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi    "
ssh  mamute -n "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=1 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.15.1
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi    "
ssh  myriBlade109 -n "cd /home/ivan && exec env MXMPI_MAGIC=1802255
MXMPI_MASTER=mamute MXMPI_PORT=55684 MX_DISABLE_SHMEM=0
MXMPI_VERBOSE=1 MXMPI_SIGCATCH=1 LD_LIBRARY_PATH=/usr/lib:/usr/lib64
MXMPI_ID=2 MXMPI_NP=3 MXMPI_BOARD=-1 MXMPI_SLAVE=192.168.30.209
/mamuteData/ivan/lib/mpich-mx-1.2.7-5-xl/bin/mpimxlabel
/home/ivan/./cpi    "
All processes have been spawned
MPI Id 1 is using mx port 0, board 0 (MAC 0060dd47afe7).
MPI Id 2 is using mx port 0, board 0 (MAC 0060dd478aff).
MPI Id 0 is using mx port 1, board 0 (MAC 0060dd47afe7).
Received data from all 3 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Data sent to all processes.
__________________________________________

I notice, thought, that the spawing is _much_ faster than firing the
process from the GPFS partition.

This is the output of strace -f (lots of things here!):
________________________________________
[pid  7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid  7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid  7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid  7498] setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  7498] connect(4, {sa_family=AF_INET, sin_port=htons(55787),
sin_addr=inet_addr("192.168.15.1")}, 16) = 0
[pid  7498] write(1, "Sending mapping to MPI Id 1.\n", 29Sending
mapping to MPI Id 1.
) = 29
[pid  7498] send(4, "[[[<0:96:3712462823:0><1:96:3712"..., 72, 0) = 72
[pid  7498] close(4)                    = 0
[pid  7498] time([1191247146])          = 1191247146
[pid  7498] open("/etc/hosts", O_RDONLY) = 4
[pid  7498] fcntl64(4, F_GETFD)         = 0
[pid  7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid  7498] fstat64(4, {st_mode=S_IFREG|0644, st_size=10247, ...}) = 0
[pid  7498] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40018000
[pid  7498] read(4, "#\n# hosts         This file desc"..., 4096) = 4096
[pid  7498] read(4, "yriBlade077\n192.168.30.178  myri"..., 4096) = 4096
[pid  7498] read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
[pid  7498] read(4, "", 4096)           = 0
[pid  7498] close(4)                    = 0
[pid  7498] munmap(0x40018000, 4096)    = 0
[pid  7498] open("/etc/protocols", O_RDONLY) = 4
[pid  7498] fcntl64(4, F_GETFD)         = 0
[pid  7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid  7498] fstat64(4, {st_mode=S_IFREG|0644, st_size=6561, ...}) = 0
[pid  7498] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40018000
[pid  7498] read(4, "#\n# protocols\tThis file describe"..., 4096) = 4096
[pid  7498] close(4)                    = 0
[pid  7498] munmap(0x40018000, 4096)    = 0
[pid  7498] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid  7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid  7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid  7498] ioctl(4, TCGETS or TCGETS, 0xffffda30) = -1 EINVAL
(Invalid argument)
[pid  7498] _llseek(4, 0, 0xffffda98, SEEK_CUR) = -1 ESPIPE (Illegal seek)
[pid  7498] fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
[pid  7498] setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  7498] connect(4, {sa_family=AF_INET, sin_port=htons(45412),
sin_addr=inet_addr("192.168.30.209")}, 16) = 0
[pid  7498] write(1, "Sending mapping to MPI Id 2.\n", 29Sending
mapping to MPI Id 2.
) = 29
[pid  7498] send(4, "[[[<0:96:3712462823:0><1:96:3712"..., 69, 0) = 69
[pid  7498] close(4)                    = 0
[pid  7498] alarm(0)                    = 0
[pid  7498] write(1, "Data sent to all processes.\n", 28Data sent to
all processes.
) = 28
[pid  7498] accept(3,  <unfinished ...>
[pid  7499] <... select resumed> )      = 1 (in [3])
[pid  7499] read(3,
"\302\317\32\275\357jD\230\222=\270N\341F\237\326@]\4\4"..., 8192) =
80
[pid  7499] select(7, [3 4], [6], NULL, NULL) = 1 (out [6])
[pid  7499] write(6, "0: Process 0 on mamute.lcca.usp."..., 350:
Process 0 on mamute.lcca.usp.br
) = 35
[pid  7499] select(7, [3 4], [], NULL, NULL
________________________________________

and hangs. This was with the binary _out_ of GPFS and statically compiled.

My ticket number is 53912, and Ruth and Scott are helping me.

Mark, ltrace does not accepts the mpirun.ch_mx as a valid elf
binary... it was compiled using the xlc compiler. Strange, because it
works with other system binaries (like ls...)

Thank you very much!!

Ivan

2007/10/1, Mark Hahn <hahn at mcmaster.ca>:
> > clone(child_stack=0,
> > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > child_tidptr=0x40046f68) = 31384
> > waitpid(-1,
>
> this looks like a fork/exec that's failing.  as you might expect
> if, for instance, your shared FS doesn't supply a binary successfully.
> note also that ltrace -S often provides somewhat more intelligible
> diags for this kind of thing (since it might show what's actually
> being exec'ed.)
>


-- 
-----------------------------------------------------------
Ivan S. P. Marin
----------------------------------------------------------



More information about the Beowulf mailing list