[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Faraz Hussain info at feacluster.com
Tue Apr 30 09:28:14 PDT 2019


Thanks, here is the output below:

[hussaif1 at lustwzb34 ~]$ ompi_info
[lustwzb34:10457] mca_base_component_repository_open: unable to open  
mca_btl_usnic: libpsm_infinipath.so.1: cannot open shared object file:  
No such file or directory (ignored)
[lustwzb34:10457] mca_base_component_repository_open: unable to open  
mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file:  
No such file or directory (ignored)
[lustwzb34:10457] mca_base_component_repository_open: unable to open  
mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file:  
No such file or directory (ignored)
                  Package: Open MPI mockbuild at x86-041.build.eng.bos.redhat.com
                           Distribution
                 Open MPI: 3.0.2
   Open MPI repo revision: v3.0.2
    Open MPI release date: Jun 01, 2018
                 Open RTE: 3.0.2
   Open RTE repo revision: v3.0.2
    Open RTE release date: Jun 01, 2018
                     OPAL: 3.0.2
       OPAL repo revision: v3.0.2
        OPAL release date: Jun 01, 2018
                  MPI API: 3.1.0
             Ident string: 3.0.2
                   Prefix: /usr/lib64/openmpi3
  Configured architecture: x86_64-unknown-linux-gnu
           Configure host: x86-041.build.eng.bos.redhat.com
            Configured by: mockbuild
            Configured on: Wed Jun 13 14:18:03 EDT 2018
           Configure host: x86-041.build.eng.bos.redhat.com
   Configure command line: '--prefix=/usr/lib64/openmpi3'
                           '--mandir=/usr/share/man/openmpi3-x86_64'
                           '--includedir=/usr/include/openmpi3-x86_64'
                           '--sysconfdir=/etc/openmpi3-x86_64'
                           '--disable-silent-rules' '--enable-builtin-atomics'
                           '--enable-mpi-cxx' '--with-sge' '--with-valgrind'
                           '--enable-memchecker' '--with-hwloc=/usr' 'CC=gcc'
                           'CXX=g++' 'LDFLAGS=-Wl,-z,relro ' 'CFLAGS= -O2 -g
                           -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
                           -fstack-protector-strong --param=ssp-buffer-size=4
                           -grecord-gcc-switches   -m64 -mtune=generic'
                           'CXXFLAGS= -O2 -g -pipe -Wall
                           -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
                           -fstack-protector-strong --param=ssp-buffer-size=4
                           -grecord-gcc-switches   -m64 -mtune=generic'
                           'FC=gfortran' 'FCFLAGS= -O2 -g -pipe -Wall
                           -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
                           -fstack-protector-strong --param=ssp-buffer-size=4
                           -grecord-gcc-switches   -m64 -mtune=generic'
                 Built by: mockbuild
                 Built on: Wed Jun 13 14:25:02 EDT 2018
               Built host: x86-041.build.eng.bos.redhat.com
               C bindings: yes
             C++ bindings: yes
              Fort mpif.h: yes (all)
             Fort use mpi: yes (limited: overloading)
        Fort use mpi size: deprecated-ompi-info-value
         Fort use mpi_f08: no
  Fort mpi_f08 compliance: The mpi_f08 module was not built
   Fort mpi_f08 subarrays: no
            Java bindings: no
   Wrapper compiler rpath: runpath
               C compiler: gcc
      C compiler absolute: /usr/bin/gcc
   C compiler family name: GNU
       C compiler version: 4.8.5
             C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
            Fort compiler: gfortran
        Fort compiler abs: /usr/bin/gfortran
          Fort ignore TKR: no
    Fort 08 assumed shape: no
       Fort optional args: no
           Fort INTERFACE: yes
     Fort ISO_FORTRAN_ENV: yes
        Fort STORAGE_SIZE: no
       Fort BIND(C) (all): no
       Fort ISO_C_BINDING: yes
  Fort SUBROUTINE BIND(C): no
        Fort TYPE,BIND(C): no
  Fort T,BIND(C,name="a"): no
             Fort PRIVATE: no
           Fort PROTECTED: no
            Fort ABSTRACT: no
        Fort ASYNCHRONOUS: no
           Fort PROCEDURE: no
          Fort USE...ONLY: no
            Fort C_FUNLOC: no
  Fort f08 using wrappers: no
          Fort MPI_SIZEOF: no
              C profiling: yes
            C++ profiling: yes
    Fort mpif.h profiling: yes
   Fort use mpi profiling: yes
    Fort use mpi_f08 prof: no
           C++ exceptions: no
           Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                           OMPI progress: no, ORTE progress: yes, Event lib:
                           yes)
            Sparse Groups: no
   Internal debug support: no
   MPI interface warnings: yes
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
               dl support: yes
    Heterogeneous support: no
  mpirun default --prefix: no
          MPI I/O support: yes
        MPI_WTIME support: native
      Symbol vis. support: yes
    Host topology support: yes
           MPI extensions: affinity, cuda
    FT Checkpoint support: no (checkpoint thread: no)
    C/R Enabled Debugging: no
   MPI_MAX_PROCESSOR_NAME: 256
     MPI_MAX_ERROR_STRING: 256
      MPI_MAX_OBJECT_NAME: 64
         MPI_MAX_INFO_KEY: 36
         MPI_MAX_INFO_VAL: 256
        MPI_MAX_PORT_NAME: 1024
   MPI_MAX_DATAREP_STRING: 128
            MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
            MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.0.2)
            MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.2)
             MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
             MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                   MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                   MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                   MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
          MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.0.2)
          MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.0.2)
           MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0, Component v3.0.2)
               MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.0.2)
              MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                           v3.0.2)
                 MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
               MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.0.2)
                MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                  MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                  MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.0.2)
               MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                           v3.0.2)
               MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                           v3.0.2)
               MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                           v3.0.2)
               MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                           v3.0.2)
               MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                           v3.0.2)
                  MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.0.2)
              MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
             MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                 MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                  MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                  MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.0.2)
               MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.0.2)
               MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v3.0.2)
               MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.0.2)
               MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.0.2)
               MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.0.2)
               MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.0.2)
               MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.0.2)
               MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.0.2)
                  MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                   MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                   MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                   MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.0.2)
                  MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
                  MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                  MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.0.2)
             MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
             MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)
             MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.0.2)
                 MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                           v3.0.2)
                 MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.0.2)
            MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                           v3.0.2)


Quoting John Hearns <hearnsj at googlemail.com>:

> Hello Faraz.  Please start by running this command    ompi_info
>
> On Tue, 30 Apr 2019 at 15:15, Faraz Hussain <info at feacluster.com> wrote:
>
>> I installed RedHat 7.5 on two machines with the following Mellanox cards:
>>
>> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
>> [ConnectX-3 Pro
>>
>> I followed the steps outlined here to verify RDMA is working:
>>
>>
>> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
>>
>> However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
>> get this error:
>>
>> --------------------------------------------------------------------------
>>
>> No OpenFabrics connection schemes reported that they were able to be
>>
>> used on a specific port. As such, the openib BTL (OpenFabrics
>>
>> support) will be disabled for this port.
>>
>>
>>   Local host:      lustwzb34
>>
>>   Local device:     mlx4_0
>>
>>   Local port:      1
>>
>>   CPCs attempted:    rdmacm, udcm
>>
>> --------------------------------------------------------------------------
>>
>> Then it just hangs till I press control C.
>>
>> I understand this may be an issue with RedHat,  Open MPI or Mellanox.
>> Any ideas to debug which place it could be?
>>
>> Thanks!
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>





More information about the Beowulf mailing list