[Beowulf] newbie question about mpich2 on heterogenous cluster
baenni at kiecks.de
baenni at kiecks.de
Tue Mar 22 05:03:50 PST 2005
Dear List
I installed mpich2-1.0 on my little cluster (2 Linux nodes and 3 Solaris
nodes). I first worked only on the two linux nodes, where the programms run
without troubles. But when I would like to invoke the solaris nodes, i.e.
when I run the programs on a heterogenous cluster, it ents up in error
messages. For some reoson, the -arch parameter is not implemented in
mpich2-1.0.
Does anyone have experience with such problems? Can I run mpich2 on a
heterogonous cluster?
Thanks in advance for any help
mpiexec -n 1 -host shaw -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi
_cpi : -n 1 -host devienne -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi
_cpi : -n 1 -host gallay -path /export/home/baenni/example/PARALLEL/cpi
_cpi : -n 2 -host gallay1 -path /export/home/baenni/example/PARALLEL/cpi
_cpi
aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(821): MPI_Bcast(buf=0x8145480, count=1, MPI_INT, root=0,
MPI_COMM_WORLD) failed
MPIR_Bcast(229):
MPIC_Send(48):
MPIC_Wait(308):
MPIDI_CH3_Progress_wait(207): an error occurred while handling an event
returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(492):
connection_recv_fail(1728):
MPIDU_Socki_handle_read(590): connection closed by peer (set=0,sock=1)
aborting job:
Fatal error in MPI_Bcast: Internal MPI error!, error stack:
MPI_Bcast(821): MPI_Bcast(buf=1786e0, count=1, MPI_INT, root=0,
MPI_COMM_WORLD) failed
MPIR_Bcast(197):
MPIC_Recv(98):
MPIC_Wait(308):
MPIDI_CH3_Progress_wait(207): an error occurred while handling an event
returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(849): [ch3:sock] received packet of
unknown type (369098752)
rank 4 in job 19 shaw_33110 caused collective abort of all ranks
exit status of rank 4: killed by signal 9
More information about the Beowulf
mailing list