[Beowulf] slow mpi init/finalize
Michael Di Domenico
mdidomenico4 at gmail.com
Mon Oct 16 10:11:37 PDT 2017
On Mon, Oct 16, 2017 at 7:16 AM, Peter Kjellström <cap at nsc.liu.se> wrote:
> Another is that your MPIs tried to use rdmacm and that in turn tried to
> use ibacm which, if incorrectly setup, times out after ~1m. You can
> verify ibacm functionality by running for example:
>
> user at n1 $ ib_acme -d n2
> ...
> user at n1 $
>
> This should be near instant if ibacm works as it should.
i didn't specifically tell mpi to use one connection setup vs another,
but i'll see if i can track down what openmpi is doing in that regard.
however, your test above fails on my machines
user at n1# ib_acme -d n3
service: localhost
destination: n3
ib_acm_resolve_ip failed: cannot assign requested address
return status 0x0
in the /etc/rdma/ibacme_addr.cfg file i just lists the data specific
to each host, which is gathered by ib_acme -A
truthfully i never configured, i though it just "worked" on it's own,
but perhaps not. i'll have to google some
More information about the Beowulf
mailing list