[bproc]MPI chokes

Jag agrajag@linuxpower.org
Thu, 15 Mar 2001 07:44:48 -0800


--/CHBEyLWSSL2QXIQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> Erik Arjan Hendriks wrote:
>=20
> > On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> >=20
> >> I've installed Scyld on a small cluster and I'm trying to
> >> run the test programs that come with beompi
> >>=20
> >> The codes run on one node. However, when I try to run
> >> on multiple nodes I get the following error
> >>=20
> >> jarrett/home/edwardsa>mpirun -np 2 pi3p
> >> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
> >>     p4_error: latest msg from perror: Invalid argument
> >> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
> >>=20

<snip>

> >=20
> > BProc doesn't use any host names anywhere so nothing involving
> > hostnames will affect whether or an rfork works.
> >=20
> > There's some other MPI issue going on here.
> >=20
> > - Erik
> >=20
>=20
> Thanks for the reply. The program dies in the PMPI_INIT phase. What=20
> should I be doing to figure this out?

Based on the error messages from your previous message, it looks like it
is trying to rfork to a node that is down.  What does the output of
'bpstat' on your cluster look like?


Jag

--/CHBEyLWSSL2QXIQ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6sONw+pq97aGGtXARAmG9AJ4hKORGiuy7BTv51F8RIxHS7fxpBACdGyUV
cZmj4FxGehE/QRa/OAu7sCs=
=3uSE
-----END PGP SIGNATURE-----

--/CHBEyLWSSL2QXIQ--