[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
Faraz Hussain
info at feacluster.com
Tue Apr 30 09:28:14 PDT 2019
Thanks, here is the output below:
[hussaif1 at lustwzb34 ~]$ ompi_info
[lustwzb34:10457] mca_base_component_repository_open: unable to open
mca_btl_usnic: libpsm_infinipath.so.1: cannot open shared object file:
No such file or directory (ignored)
[lustwzb34:10457] mca_base_component_repository_open: unable to open
mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file:
No such file or directory (ignored)
[lustwzb34:10457] mca_base_component_repository_open: unable to open
mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file:
No such file or directory (ignored)
Package: Open MPI mockbuild at x86-041.build.eng.bos.redhat.com
Distribution
Open MPI: 3.0.2
Open MPI repo revision: v3.0.2
Open MPI release date: Jun 01, 2018
Open RTE: 3.0.2
Open RTE repo revision: v3.0.2
Open RTE release date: Jun 01, 2018
OPAL: 3.0.2
OPAL repo revision: v3.0.2
OPAL release date: Jun 01, 2018
MPI API: 3.1.0
Ident string: 3.0.2
Prefix: /usr/lib64/openmpi3
Configured architecture: x86_64-unknown-linux-gnu
Configure host: x86-041.build.eng.bos.redhat.com
Configured by: mockbuild
Configured on: Wed Jun 13 14:18:03 EDT 2018
Configure host: x86-041.build.eng.bos.redhat.com
Configure command line: '--prefix=/usr/lib64/openmpi3'
'--mandir=/usr/share/man/openmpi3-x86_64'
'--includedir=/usr/include/openmpi3-x86_64'
'--sysconfdir=/etc/openmpi3-x86_64'
'--disable-silent-rules' '--enable-builtin-atomics'
'--enable-mpi-cxx' '--with-sge' '--with-valgrind'
'--enable-memchecker' '--with-hwloc=/usr' 'CC=gcc'
'CXX=g++' 'LDFLAGS=-Wl,-z,relro ' 'CFLAGS= -O2 -g
-pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic'
'CXXFLAGS= -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic'
'FC=gfortran' 'FCFLAGS= -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic'
Built by: mockbuild
Built on: Wed Jun 13 14:25:02 EDT 2018
Built host: x86-041.build.eng.bos.redhat.com
C bindings: yes
C++ bindings: yes
Fort mpif.h: yes (all)
Fort use mpi: yes (limited: overloading)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: no
Fort mpi_f08 compliance: The mpi_f08 module was not built
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C compiler family name: GNU
C compiler version: 4.8.5
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fort compiler: gfortran
Fort compiler abs: /usr/bin/gfortran
Fort ignore TKR: no
Fort 08 assumed shape: no
Fort optional args: no
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: no
Fort BIND(C) (all): no
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): no
Fort TYPE,BIND(C): no
Fort T,BIND(C,name="a"): no
Fort PRIVATE: no
Fort PROTECTED: no
Fort ABSTRACT: no
Fort ASYNCHRONOUS: no
Fort PROCEDURE: no
Fort USE...ONLY: no
Fort C_FUNLOC: no
Fort f08 using wrappers: no
Fort MPI_SIZEOF: no
C profiling: yes
C++ profiling: yes
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: no
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, ORTE progress: yes, Event lib:
yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: no
mpirun default --prefix: no
MPI I/O support: yes
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
MPI extensions: affinity, cuda
FT Checkpoint support: no (checkpoint thread: no)
C/R Enabled Debugging: no
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
v3.0.2)
MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.0.2)
MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
v3.0.2)
MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
v3.0.2)
MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
v3.0.2)
MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
v3.0.2)
MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
v3.0.2)
MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.0.2)
MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
v3.0.2)
MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.0.2)
MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
v3.0.2)
Quoting John Hearns <hearnsj at googlemail.com>:
> Hello Faraz. Please start by running this command ompi_info
>
> On Tue, 30 Apr 2019 at 15:15, Faraz Hussain <info at feacluster.com> wrote:
>
>> I installed RedHat 7.5 on two machines with the following Mellanox cards:
>>
>> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
>> [ConnectX-3 Pro
>>
>> I followed the steps outlined here to verify RDMA is working:
>>
>>
>> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
>>
>> However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
>> get this error:
>>
>> --------------------------------------------------------------------------
>>
>> No OpenFabrics connection schemes reported that they were able to be
>>
>> used on a specific port. As such, the openib BTL (OpenFabrics
>>
>> support) will be disabled for this port.
>>
>>
>> Local host: lustwzb34
>>
>> Local device: mlx4_0
>>
>> Local port: 1
>>
>> CPCs attempted: rdmacm, udcm
>>
>> --------------------------------------------------------------------------
>>
>> Then it just hangs till I press control C.
>>
>> I understand this may be an issue with RedHat, Open MPI or Mellanox.
>> Any ideas to debug which place it could be?
>>
>> Thanks!
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
More information about the Beowulf
mailing list