[Beowulf] newbie question about mpich2 on heterogenous cluster

baenni at kiecks.de baenni at kiecks.de
Tue Mar 22 05:03:50 PST 2005


Dear List

I installed mpich2-1.0 on my little cluster (2 Linux nodes and 3 Solaris 
nodes). I first worked only on the two linux nodes, where the programms run 
without troubles. But when I would like to invoke the solaris nodes, i.e. 
when I run the programs on a heterogenous cluster, it ents up in error 
messages. For some reoson, the -arch parameter is not implemented in 
mpich2-1.0. 

Does anyone have experience with such problems? Can I run mpich2 on a 
heterogonous cluster?

Thanks in advance for any help





mpiexec -n 1 -host shaw -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi 
_cpi : -n 1 -host devienne  -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi 
_cpi : -n 1 -host gallay  -path /export/home/baenni/example/PARALLEL/cpi 
_cpi : -n 2 -host gallay1  -path /export/home/baenni/example/PARALLEL/cpi 
_cpi



aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(821): MPI_Bcast(buf=0x8145480, count=1, MPI_INT, root=0, 
MPI_COMM_WORLD) failed
MPIR_Bcast(229):
MPIC_Send(48):
MPIC_Wait(308):
MPIDI_CH3_Progress_wait(207): an error occurred while handling an event 
returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(492):
connection_recv_fail(1728):
MPIDU_Socki_handle_read(590): connection closed by peer (set=0,sock=1)
aborting job:
Fatal error in MPI_Bcast: Internal MPI error!, error stack:
MPI_Bcast(821): MPI_Bcast(buf=1786e0, count=1, MPI_INT, root=0, 
MPI_COMM_WORLD) failed
MPIR_Bcast(197):
MPIC_Recv(98):
MPIC_Wait(308):
MPIDI_CH3_Progress_wait(207): an error occurred while handling an event 
returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(849): [ch3:sock] received packet of 
unknown type (369098752)
rank 4 in job 19  shaw_33110   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9



More information about the Beowulf mailing list