[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
Prentice Bisbal
pbisbal at pppl.gov
Tue Apr 30 14:41:58 PDT 2019
I agree with Gus that you should be asking this question on the OpenMPI
mailing list where there's more expertise specific to OpenMPI. Based on
your error message
> mca_base_component_repository_open: unable to open
> mca_btl_usnic: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
It looks like it's not a problem with IB libraries in general, but that
you are missing the PSM libraries. PSM is an additional library used by
QLogic cards, so installing Mellanox OFED will not help you out here. In
fact, it will probably just make things worse. Since Intel bought QLogic
a while back, I think you need to install the Intel PSM RPMs.
If you are not using QLogic cards in your cluster, you will need to
rebuild OpenMPI without PSM support.
Did you build this software on one system and install it on a shared
filesystem, or copy it to the other nodes after the build? If so, the
build system probably has the proper libraries installed. The
configuration command-line for this build doesn't explicitly mention PSM:
> Configure command line: '--prefix=/usr/lib64/openmpi3'
> '--mandir=/usr/share/man/openmpi3-x86_64'
> '--includedir=/usr/include/openmpi3-x86_64'
> '--sysconfdir=/etc/openmpi3-x86_64'
> '--disable-silent-rules'
> '--enable-builtin-atomics'
> '--enable-mpi-cxx' '--with-sge'
> '--with-valgrind'
> '--enable-memchecker' '--with-hwloc=/usr'
> 'CC=gcc'
> 'CXX=g++' 'LDFLAGS=-Wl,-z,relro ' 'CFLAGS=
> -O2 -g
> -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
> 'CXXFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
> 'FC=gfortran' 'FCFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic'
When something is included that isn't explicitly stated on the
configuration line, that usually means the configure script looked for
the supporting libraries, found them, and so decided to include the
support for that feature.
Prentice
On 4/30/19 2:28 PM, Gus Correa wrote:
> HI Faraz
>
> My impression is that you're missing the IB libraries, and that Open MPI
> was not built with IB support.
> This is very likely to be the case if you're using the Open MPI
> packages from CentOS (openmpi3.x86_64, openmpi3-devel.x86_64)
> which probably only have TCP/IP support built in (the common
> denominator network of most computers).
> Building Open MPI from source is not difficult, and a must if you have
> IB cards.
>
> Notwithstanding the MPI expertise of the Beowulf mailing list
> subscribers,
> if you post your message in the Open MPI mailing list, you'll get
> specific and detailed help in no time,
> and minimize the suffering.
>
> My two cents,
> Gus Correa
>
> On Tue, Apr 30, 2019 at 12:28 PM Faraz Hussain <info at feacluster.com
> <mailto:info at feacluster.com>> wrote:
>
> Thanks, here is the output below:
>
> [hussaif1 at lustwzb34 ~]$ ompi_info
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_btl_usnic: libpsm_infinipath.so.1: cannot open shared object
> file:
> No such file or directory (ignored)
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
> [lustwzb34:10457] mca_base_component_repository_open: unable to open
> mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file:
> No such file or directory (ignored)
> Package: Open MPI
> mockbuild at x86-041.build.eng.bos.redhat.com
> <mailto:mockbuild at x86-041.build.eng.bos.redhat.com>
> Distribution
> Open MPI: 3.0.2
> Open MPI repo revision: v3.0.2
> Open MPI release date: Jun 01, 2018
> Open RTE: 3.0.2
> Open RTE repo revision: v3.0.2
> Open RTE release date: Jun 01, 2018
> OPAL: 3.0.2
> OPAL repo revision: v3.0.2
> OPAL release date: Jun 01, 2018
> MPI API: 3.1.0
> Ident string: 3.0.2
> Prefix: /usr/lib64/openmpi3
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: x86-041.build.eng.bos.redhat.com
> <http://x86-041.build.eng.bos.redhat.com>
> Configured by: mockbuild
> Configured on: Wed Jun 13 14:18:03 EDT 2018
> Configure host: x86-041.build.eng.bos.redhat.com
> <http://x86-041.build.eng.bos.redhat.com>
> Configure command line: '--prefix=/usr/lib64/openmpi3'
> '--mandir=/usr/share/man/openmpi3-x86_64'
> '--includedir=/usr/include/openmpi3-x86_64'
> '--sysconfdir=/etc/openmpi3-x86_64'
> '--disable-silent-rules'
> '--enable-builtin-atomics'
> '--enable-mpi-cxx' '--with-sge'
> '--with-valgrind'
> '--enable-memchecker'
> '--with-hwloc=/usr' 'CC=gcc'
> 'CXX=g++' 'LDFLAGS=-Wl,-z,relro '
> 'CFLAGS= -O2 -g
> -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64
> -mtune=generic'
> 'CXXFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64
> -mtune=generic'
> 'FC=gfortran' 'FCFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong
> --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64
> -mtune=generic'
> Built by: mockbuild
> Built on: Wed Jun 13 14:25:02 EDT 2018
> Built host: x86-041.build.eng.bos.redhat.com
> <http://x86-041.build.eng.bos.redhat.com>
> C bindings: yes
> C++ bindings: yes
> Fort mpif.h: yes (all)
> Fort use mpi: yes (limited: overloading)
> Fort use mpi size: deprecated-ompi-info-value
> Fort use mpi_f08: no
> Fort mpi_f08 compliance: The mpi_f08 module was not built
> Fort mpi_f08 subarrays: no
> Java bindings: no
> Wrapper compiler rpath: runpath
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C compiler family name: GNU
> C compiler version: 4.8.5
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fort compiler: gfortran
> Fort compiler abs: /usr/bin/gfortran
> Fort ignore TKR: no
> Fort 08 assumed shape: no
> Fort optional args: no
> Fort INTERFACE: yes
> Fort ISO_FORTRAN_ENV: yes
> Fort STORAGE_SIZE: no
> Fort BIND(C) (all): no
> Fort ISO_C_BINDING: yes
> Fort SUBROUTINE BIND(C): no
> Fort TYPE,BIND(C): no
> Fort T,BIND(C,name="a"): no
> Fort PRIVATE: no
> Fort PROTECTED: no
> Fort ABSTRACT: no
> Fort ASYNCHRONOUS: no
> Fort PROCEDURE: no
> Fort USE...ONLY: no
> Fort C_FUNLOC: no
> Fort f08 using wrappers: no
> Fort MPI_SIZEOF: no
> C profiling: yes
> C++ profiling: yes
> Fort mpif.h profiling: yes
> Fort use mpi profiling: yes
> Fort use mpi_f08 prof: no
> C++ exceptions: no
> Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL
> support: yes,
> OMPI progress: no, ORTE progress: yes,
> Event lib:
> yes)
> Sparse Groups: no
> Internal debug support: no
> MPI interface warnings: yes
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> dl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: native
> Symbol vis. support: yes
> Host topology support: yes
> MPI extensions: affinity, cuda
> FT Checkpoint support: no (checkpoint thread: no)
> C/R Enabled Debugging: no
> MPI_MAX_PROCESSOR_NAME: 256
> MPI_MAX_ERROR_STRING: 256
> MPI_MAX_OBJECT_NAME: 64
> MPI_MAX_INFO_KEY: 36
> MPI_MAX_INFO_VAL: 256
> MPI_MAX_PORT_NAME: 1024
> MPI_MAX_DATAREP_STRING: 128
> MCA allocator: basic (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA allocator: bucket (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA btl: openib (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA btl: self (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA btl: vader (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA crs: none (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA dl: dlopen (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA event: libevent2022 (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA hwloc: external (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA installdirs: config (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA memory: patcher (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA mpool: hugepage (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA patcher: overwrite (MCA v2.1.0, API v1.0.0,
> Component
> v3.0.2)
> MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pmix: isolated (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA pstat: linux (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA rcache: grdma (MCA v2.1.0, API v3.3.0,
> Component v3.0.2)
> MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA shmem: posix (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA timer: linux (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA dfs: app (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA dfs: orted (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA dfs: test (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA errmgr: default_app (MCA v2.1.0, API v3.0.0,
> Component
> v3.0.2)
> MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0,
> Component
> v3.0.2)
> MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0,
> Component
> v3.0.2)
> MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0,
> Component
> v3.0.2)
> MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA ess: env (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA ess: singleton (MCA v2.1.0, API v3.0.0,
> Component
> v3.0.2)
> MCA ess: slurm (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA ess: tool (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA filem: raw (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA grpcomm: direct (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA iof: tool (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA iof: orted (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA notifier: syslog (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA odls: default (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA oob: ud (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA plm: isolated (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA plm: slurm (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA ras: gridengine (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA ras: simulator (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA ras: slurm (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA rmaps: mindist (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA rmaps: resilient (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rml: oob (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA routed: binomial (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA routed: debruijn (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA routed: direct (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA routed: radix (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA rtc: hwloc (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA schizo: slurm (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA state: app (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA state: dvm (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA state: hnp (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA state: novm (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA state: orted (MCA v2.1.0, API v1.0.0,
> Component v3.0.2)
> MCA state: tool (MCA v2.1.0, API v1.0.0, Component
> v3.0.2)
> MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA coll: basic (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA coll: inter (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA coll: libnbc (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA coll: self (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA coll: sm (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA coll: sync (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA coll: tuned (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA fbtl: posix (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA fcoll: individual (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA fcoll: static (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA io: ompio (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA io: romio314 (MCA v2.1.0, API v2.0.0,
> Component v3.0.2)
> MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA osc: pt2pt (MCA v2.1.0, API v3.0.0,
> Component v3.0.2)
> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA osc: sm (MCA v2.1.0, API v3.0.0, Component
> v3.0.2)
> MCA pml: v (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pml: cm (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA pml: monitoring (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA rte: orte (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA sharedfp: individual (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
> MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component
> v3.0.2)
> MCA topo: treematch (MCA v2.1.0, API v2.2.0,
> Component
> v3.0.2)
> MCA topo: basic (MCA v2.1.0, API v2.2.0,
> Component v3.0.2)
> MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0,
> Component
> v3.0.2)
>
>
> Quoting John Hearns <hearnsj at googlemail.com
> <mailto:hearnsj at googlemail.com>>:
>
> > Hello Faraz. Please start by running this command ompi_info
> >
> > On Tue, 30 Apr 2019 at 15:15, Faraz Hussain <info at feacluster.com
> <mailto:info at feacluster.com>> wrote:
> >
> >> I installed RedHat 7.5 on two machines with the following
> Mellanox cards:
> >>
> >> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
> >> [ConnectX-3 Pro
> >>
> >> I followed the steps outlined here to verify RDMA is working:
> >>
> >>
> >>
> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
> >>
> >> However, I cannot seem to get Open MPI 3.0.2 to work. When I
> run it, I
> >> get this error:
> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> No OpenFabrics connection schemes reported that they were able
> to be
> >>
> >> used on a specific port. As such, the openib BTL (OpenFabrics
> >>
> >> support) will be disabled for this port.
> >>
> >>
> >> Local host: lustwzb34
> >>
> >> Local device: mlx4_0
> >>
> >> Local port: 1
> >>
> >> CPCs attempted: rdmacm, udcm
> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> Then it just hangs till I press control C.
> >>
> >> I understand this may be an issue with RedHat, Open MPI or
> Mellanox.
> >> Any ideas to debug which place it could be?
> >>
> >> Thanks!
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190430/4d276733/attachment-0001.html>
More information about the Beowulf
mailing list