[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
Gus Correa
gus at ldeo.columbia.edu
Tue Apr 30 11:28:53 PDT 2019
HI Faraz
My impression is that you're missing the IB libraries, and that Open MPI
was not built with IB support.
This is very likely to be the case if you're using the Open MPI packages
from CentOS (openmpi3.x86_64, openmpi3-devel.x86_64)
which probably only have TCP/IP support built in (the common denominator
network of most computers).
Building Open MPI from source is not difficult, and a must if you have IB
cards.
Notwithstanding the MPI expertise of the Beowulf mailing list subscribers,
if you post your message in the Open MPI mailing list, you'll get specific
and detailed help in no time,
and minimize the suffering.
My two cents,
Gus Correa
On Tue, Apr 30, 2019 at 12:28 PM Faraz Hussain <info at feacluster.com> wrote:
> Thanks, here is the output below:
>
> [hussaif1 at lustwzb34 ~]$ ompi_info
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_btl_usnic: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
> Package: Open MPI
> mockbuild at x86-041.build.eng.bos.redhat.com
> Distribution
> Open MPI: 3.0.2
> Open MPI repo revision: v3.0.2
> Open MPI release date: Jun 01, 2018
> Open RTE: 3.0.2
> Open RTE repo revision: v3.0.2
> Open RTE release date: Jun 01, 2018
> OPAL: 3.0.2
> OPAL repo revision: v3.0.2
> OPAL release date: Jun 01, 2018
> MPI API: 3.1.0
> Ident string: 3.0.2
> Prefix: /usr/lib64/openmpi3
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: x86-041.build.eng.bos.redhat.com
> Configured by: mockbuild
> Configured on: Wed Jun 13 14:18:03 EDT 2018
> Configure host: x86-041.build.eng.bos.redhat.com
> Configure command line: '--prefix=/usr/lib64/openmpi3'
> '--mandir=/usr/share/man/openmpi3-x86_64'
> '--includedir=/usr/include/openmpi3-x86_64'
> '--sysconfdir=/etc/openmpi3-x86_64'
> '--disable-silent-rules'
> '--enable-builtin-atomics'
> '--enable-mpi-cxx' '--with-sge'
> '--with-valgrind'
> '--enable-memchecker' '--with-hwloc=/usr'
> 'CC=gcc'
> 'CXX=g++' 'LDFLAGS=-Wl,-z,relro ' 'CFLAGS= -O2
> -g
> -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
> 'CXXFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
> 'FC=gfortran' 'FCFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
> Built by: mockbuild
> Built on: Wed Jun 13 14:25:02 EDT 2018
> Built host: x86-041.build.eng.bos.redhat.com
> C bindings: yes
> C++ bindings: yes
> Fort mpif.h: yes (all)
> Fort use mpi: yes (limited: overloading)
> Fort use mpi size: deprecated-ompi-info-value
> Fort use mpi_f08: no
> Fort mpi_f08 compliance: The mpi_f08 module was not built
> Fort mpi_f08 subarrays: no
> Java bindings: no
> Wrapper compiler rpath: runpath
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C compiler family name: GNU
> C compiler version: 4.8.5
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fort compiler: gfortran
> Fort compiler abs: /usr/bin/gfortran
> Fort ignore TKR: no
> Fort 08 assumed shape: no
> Fort optional args: no
> Fort INTERFACE: yes
> Fort ISO_FORTRAN_ENV: yes
> Fort STORAGE_SIZE: no
> Fort BIND(C) (all): no
> Fort ISO_C_BINDING: yes
> Fort SUBROUTINE BIND(C): no
> Fort TYPE,BIND(C): no
> Fort T,BIND(C,name="a"): no
> Fort PRIVATE: no
> Fort PROTECTED: no
> Fort ABSTRACT: no
> Fort ASYNCHRONOUS: no
> Fort PROCEDURE: no
> Fort USE...ONLY: no
> Fort C_FUNLOC: no
> Fort f08 using wrappers: no
> Fort MPI_SIZEOF: no
> C profiling: yes
> C++ profiling: yes
> Fort mpif.h profiling: yes
> Fort use mpi profiling: yes
> Fort use mpi_f08 prof: no
> C++ exceptions: no
> Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support:
> yes,
> OMPI progress: no, ORTE progress: yes, Event
> lib:
> yes)
> Sparse Groups: no
> Internal debug support: no
> MPI interface warnings: yes
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> dl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: native
> Symbol vis. support: yes
> Host topology support: yes
> MPI extensions: affinity, cuda
> FT Checkpoint support: no (checkpoint thread: no)
> C/R Enabled Debugging: no
> MPI_MAX_PROCESSOR_NAME: 256
> MPI_MAX_ERROR_STRING: 256
> MPI_MAX_OBJECT_NAME: 64
> MPI_MAX_INFO_KEY: 36
> MPI_MAX_INFO_VAL: 256
> MPI_MAX_PORT_NAME: 1024
> MPI_MAX_DATAREP_STRING: 128
> MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA btl: openib (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.0.2)
> MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA odls: default (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA routed: direct (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.0.2)
> MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
> MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
> MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
> v3.0.2)
> MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.0.2)
> MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
>
>
> Quoting John Hearns <hearnsj at googlemail.com>:
>
> > Hello Faraz. Please start by running this command ompi_info
> >
> > On Tue, 30 Apr 2019 at 15:15, Faraz Hussain <info at feacluster.com> wrote:
> >
> >> I installed RedHat 7.5 on two machines with the following Mellanox
> cards:
> >>
> >> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
> >> [ConnectX-3 Pro
> >>
> >> I followed the steps outlined here to verify RDMA is working:
> >>
> >>
> >>
> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
> >>
> >> However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
> >> get this error:
> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> No OpenFabrics connection schemes reported that they were able to be
> >>
> >> used on a specific port. As such, the openib BTL (OpenFabrics
> >>
> >> support) will be disabled for this port.
> >>
> >>
> >> Local host: lustwzb34
> >>
> >> Local device: mlx4_0
> >>
> >> Local port: 1
> >>
> >> CPCs attempted: rdmacm, udcm
> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> Then it just hangs till I press control C.
> >>
> >> I understand this may be an issue with RedHat, Open MPI or Mellanox.
> >> Any ideas to debug which place it could be?
> >>
> >> Thanks!
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190430/3a7459f4/attachment-0001.html>
More information about the Beowulf
mailing list