[bproc]MPI chokes

Arthur H. Edwards,1,505-853-6042,505-256-0834 edwards@icantbelieveimdoingthis.com
Thu, 15 Mar 2001 09:15:48 -0700


Jag wrote:

> On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:
> 
> 
>> Erik Arjan Hendriks wrote:
>> 
>> 
>>> On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
>>> 
>>> 
>>>> I've installed Scyld on a small cluster and I'm trying to
>>>> run the test programs that come with beompi
>>>> 
>>>> The codes run on one node. However, when I try to run
>>>> on multiple nodes I get the following error
>>>> 
>>>> jarrett/home/edwardsa>mpirun -np 2 pi3p
>>>> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
>>>>     p4_error: latest msg from perror: Invalid argument
>>>> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
>>>> 
>>> 
> <snip>
> 
>>> BProc doesn't use any host names anywhere so nothing involving
>>> hostnames will affect whether or an rfork works.
>>> 
>>> There's some other MPI issue going on here.
>>> 
>>> - Erik
>>> 
>> 
>> Thanks for the reply. The program dies in the PMPI_INIT phase. What 
>> should I be doing to figure this out?
> 
> Based on the error messages from your previous message, it looks like it
> is trying to rfork to a node that is down.  What does the output of
> 'bpstat' on your cluster look like?
> 
> 
> Jag

Here is the output from bpstat

jarrett/home/edwardsa>bpstat
Node    Address         Status
0       192.168.1.100   up
1       192.168.1.101   up
2       192.168.1.102   up
3       192.168.1.103   up
4       192.168.1.104   up
5       192.168.1.105   up
6       192.168.1.106   up
7       192.168.1.107   down
8       192.168.1.108   down
9       192.168.1.109   down
10      192.168.1.110   down
11      192.168.1.111   down
12      192.168.1.112   down
13      192.168.1.113   down
14      192.168.1.114   down
15      192.168.1.115   down
16      192.168.1.116   down
17      192.168.1.117   down
18      192.168.1.118   down
19      192.168.1.119   down
20      192.168.1.120   down
21      192.168.1.121   down
22      192.168.1.122   down
23      192.168.1.123   down
24      192.168.1.124   down
25      192.168.1.125   down
26      192.168.1.126   down
27      192.168.1.127   down
28      192.168.1.128   down
29      192.168.1.129   down
30      192.168.1.130   down
31      192.168.1.131   down


Art Edwards