[bproc]MPI chokes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Arthur H. Edwards,1,505-853-6042,505-256-0834 edwards at icantbelieveimdoingthis.comThu Mar 15 08:15:48 PST 2001
- Previous message: [bproc]MPI chokes
- Next message: [bproc]MPI chokes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jag wrote: > On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote: > > >> Erik Arjan Hendriks wrote: >> >> >>> On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote: >>> >>> >>>> I've installed Scyld on a small cluster and I'm trying to >>>> run the test programs that come with beompi >>>> >>>> The codes run on one node. However, when I try to run >>>> on multiple nodes I get the following error >>>> >>>> jarrett/home/edwardsa>mpirun -np 2 pi3p >>>> p0_28682: p4_error: net_create_slave: bproc_rfork: -1 >>>> p4_error: latest msg from perror: Invalid argument >>>> jarrett/home/edwardsa>bm_list_28683: p4_error: interrupt SIGINT: 2 >>>> >>> > <snip> > >>> BProc doesn't use any host names anywhere so nothing involving >>> hostnames will affect whether or an rfork works. >>> >>> There's some other MPI issue going on here. >>> >>> - Erik >>> >> >> Thanks for the reply. The program dies in the PMPI_INIT phase. What >> should I be doing to figure this out? > > Based on the error messages from your previous message, it looks like it > is trying to rfork to a node that is down. What does the output of > 'bpstat' on your cluster look like? > > > Jag Here is the output from bpstat jarrett/home/edwardsa>bpstat Node Address Status 0 192.168.1.100 up 1 192.168.1.101 up 2 192.168.1.102 up 3 192.168.1.103 up 4 192.168.1.104 up 5 192.168.1.105 up 6 192.168.1.106 up 7 192.168.1.107 down 8 192.168.1.108 down 9 192.168.1.109 down 10 192.168.1.110 down 11 192.168.1.111 down 12 192.168.1.112 down 13 192.168.1.113 down 14 192.168.1.114 down 15 192.168.1.115 down 16 192.168.1.116 down 17 192.168.1.117 down 18 192.168.1.118 down 19 192.168.1.119 down 20 192.168.1.120 down 21 192.168.1.121 down 22 192.168.1.122 down 23 192.168.1.123 down 24 192.168.1.124 down 25 192.168.1.125 down 26 192.168.1.126 down 27 192.168.1.127 down 28 192.168.1.128 down 29 192.168.1.129 down 30 192.168.1.130 down 31 192.168.1.131 down Art Edwards
- Previous message: [bproc]MPI chokes
- Next message: [bproc]MPI chokes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
