[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Gus Correa gus at ldeo.columbia.edu
Tue Apr 30 12:25:32 PDT 2019


Hi Faraz

By all means, download the Open MPI tarball and build from source.
Otherwise there won't be support for IB (the CentOS Open MPI packages most
likely rely only on TCP/IP).

Read their README file (it comes in the tarball), and take a careful look
at their (excellent) FAQ:
https://www.open-mpi.org/faq/
Many issues can be solved by just reading these two resources.

If you hit more trouble, subscribe to the Open MPI mailing list, and ask
questions there,
because you will get advice directly from the Open MPI developers, and the
fix will come easy.
https://www.open-mpi.org/community/lists/ompi.php

My two cents,
Gus Correa

On Tue, Apr 30, 2019 at 3:07 PM Faraz Hussain <info at feacluster.com> wrote:

> Thanks, yes I have installed those libraries. See below. Initially I
> installed the libraries via yum. But then I tried installing the rpms
> directly from Mellanox website (
> MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tar ). Even after doing
> that, I still got the same error with openmpi. I will try your
> suggestion of building openmpi from source next!
>
> root at lustwzb34:/root # yum list | grep ibverbs
> libibverbs.x86_64                     41mlnx1-OFED.4.5.0.1.0.45101
> libibverbs-devel.x86_64               41mlnx1-OFED.4.5.0.1.0.45101
> libibverbs-devel-static.x86_64        41mlnx1-OFED.4.5.0.1.0.45101
> libibverbs-utils.x86_64               41mlnx1-OFED.4.5.0.1.0.45101
> libibverbs.i686                       17.2-3.el7
> rhel-7-server-rpms
> libibverbs-devel.i686                 1.2.1-1.el7
> rhel-7-server-rpms
>
> root at lustwzb34:/root # lsmod | grep ib
> ib_ucm                 22602  0
> ib_ipoib              168425  0
> ib_cm                  53141  3 rdma_cm,ib_ucm,ib_ipoib
> ib_umad                22093  0
> mlx5_ib               339961  0
> ib_uverbs             121821  3 mlx5_ib,ib_ucm,rdma_ucm
> mlx5_core             919178  2 mlx5_ib,mlx5_fpga_tools
> mlx4_ib               211747  0
> ib_core               294554  10
>
> rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
> mlx4_core             360598  2 mlx4_en,mlx4_ib
> mlx_compat             29012  15
>
> rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
> devlink                42368  4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
> libcrc32c              12644  3 xfs,nf_nat,nf_conntrack
> root at lustwzb34:/root #
>
>
>
> > Did you install libibverbs  (and libibverbs-utils, for information and
> > troubleshooting)?
>
> > yum list |grep ibverbs
>
> > Are you loading the ib modules?
>
> > lsmod |grep ib
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190430/865770ad/attachment.html>


More information about the Beowulf mailing list