Problems with MPICH 1.2 and Beowulf/Linux
Francesco Marini
marini at pcmenelao.mi.infn.it
Tue Jun 6 06:55:21 PDT 2000
Hi all,
I've got a really weird problem with MPICH 1.2.
The system consists of a server and 16 computing nodes, all
diskless, mounting root via NFS from the server. It works very well
with pvm and LAM-MPI.
Now, I'm trying to compile the latest source of MPICH, the make
process goes well, but when I try to "make testing" I get this output
(repeated for all tests using more than 1 machine) :
*** Testing MPI_Test ***
pcwalhalla : Mon May 29 16:27:09 CEST 2000
/work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T
-DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H
-DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1
-DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1
-DHAVE_SIGACTION=1 -c persistent.c
/work/staff/marini/mpich-1.2.0/bin/mpicc -o persistent persistent.o
*** Testing MPI_Recv_init ***
Differences in persistent.out
2,5c2,8
< rm_3383: p4_error: rm_start: net_conn_to_listener failed: 3165
< p0_20161: p4_error: Timeout in making connection to remote process on
node1: 0
< bm_list_20162: p4_error: interrupt SIGINT: 2
< rm_l_1_20168: p4_error: interrupt SIGINT: 2
---
> Receiving message 1
> Received message 1
> Receiving message 2
> Received message 2
> Receiving message 3
> Received message 3
> Completed all receives
7d9
< rm_20167: p4_error: interrupt SIGINT: 2
pcwalhalla : Mon May 29 16:32:12 CEST 2000
/work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T
-DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H
-DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1
-DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1
-DHAVE_SIGACTION=1 -c persist.c
/work/staff/marini/mpich-1.2.0/bin/mpicc -o persist persist.o
*** Testing MPI_Startall/Request_free ***
Differences in persist.out
2,5c2
< rm_3388: p4_error: rm_start: net_conn_to_listener failed: 3171
< p0_20318: p4_error: Timeout in making connection to remote process on
node1: 0
< bm_list_20319: p4_error: interrupt SIGINT: 2
< rm_l_1_20325: p4_error: interrupt SIGINT: 2
---
> No errors
7d3
< rm_20324: p4_error: interrupt SIGINT: 2
pcwalhalla : Mon May 29 16:37:14 CEST 2000
/work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T
-DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H
-DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1
-DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1
-DHAVE_SIGACTION=1 -c persist2.c
/work/staff/marini/mpich-1.2.0/bin/mpicc -o persist2 persist2.o
*** Testing MPI_Startall(Bsend)/Request_free ***
Differences in persist2.out
2,5c2
< rm_3391: p4_error: rm_start: net_conn_to_listener failed: 3177
< p0_20473: p4_error: Timeout in making connection to remote process on
node1: 0
< bm_list_20474: p4_error: interrupt SIGINT: 2
< rm_l_1_20480: p4_error: interrupt SIGINT: 2
---
Seems like MPICH cannot start the remote process or cannot establish
the connection. The crazy thing is that with pvm and LAM-MPI all goes
well.
Any idea ?
Second : I've got some prob compiling ScaLapack with LAM-MPI, gcc
and pgf77 (f77 compiler from Portland Group), it gives a lot of
unresolved symbols regarding MPI. Anyone succeded in compiling them
under same configuration ?
Thank you all in advance,
Franz Marini
---------------------------------------------
Franz Marini
Sys Admin and Software Analyst,
Dept. of Physics, University of Milan, Italy.
email : marini at pcmenelao.mi.infn.it
---------------------------------------------
More information about the Beowulf
mailing list