[Beowulf] slow mpi init/finalize

Michael Di Domenico mdidomenico4 at gmail.com
Mon Oct 16 10:11:37 PDT 2017


On Mon, Oct 16, 2017 at 7:16 AM, Peter Kjellström <cap at nsc.liu.se> wrote:
> Another is that your MPIs tried to use rdmacm and that in turn tried to
> use ibacm which, if incorrectly setup, times out after ~1m. You can
> verify ibacm functionality by running for example:
>
> user at n1 $ ib_acme -d n2
> ...
> user at n1 $
>
> This should be near instant if ibacm works as it should.

i didn't specifically tell mpi to use one connection setup vs another,
but i'll see if i can track down what openmpi is doing in that regard.

however, your test above fails on my machines

user at n1# ib_acme -d n3
service: localhost
destination: n3
ib_acm_resolve_ip failed: cannot assign requested address
return status 0x0

in the /etc/rdma/ibacme_addr.cfg file i just lists the data specific
to each host, which is gathered by ib_acme -A

truthfully i never configured, i though it just "worked" on it's own,
but perhaps not.  i'll have to google some


More information about the Beowulf mailing list