[Beowulf] slow mpi init/finalize

Michael Di Domenico mdidomenico4 at gmail.com
Tue Oct 17 06:51:43 PDT 2017


On Tue, Oct 17, 2017 at 8:54 AM, Peter Kjellström <cap at nsc.liu.se> wrote:
>> however, your test above fails on my machines
>>
>> user at n1# ib_acme -d n3
>> service: localhost
>> destination: n3
>> ib_acm_resolve_ip failed: cannot assign requested address
>> return status 0x0
>
> Did this fail instantly or with the typical ~1m timeout?

it fails instantly.

> If you have IntelMPI also try what I suggested and use the ucm dapl.
> For example for the first port on an mlx4 hca that's "ofa-v2-mlx4_0-1u".
>
> You can make sure that it comes first in your dat.conf (/etc/rmda
> or /etc/infiniband) or pass it explicitly to IntelMPI:
>
> I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u mpiexec.hydra ...
>
> You may want to set I_MPI_DEBUG=4 or so to see what it does.

i'll give this a whirl today hopefully


More information about the Beowulf mailing list