[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
gus at ldeo.columbia.edu
Tue Apr 30 12:25:32 PDT 2019
By all means, download the Open MPI tarball and build from source.
Otherwise there won't be support for IB (the CentOS Open MPI packages most
likely rely only on TCP/IP).
Read their README file (it comes in the tarball), and take a careful look
at their (excellent) FAQ:
Many issues can be solved by just reading these two resources.
If you hit more trouble, subscribe to the Open MPI mailing list, and ask
because you will get advice directly from the Open MPI developers, and the
fix will come easy.
My two cents,
On Tue, Apr 30, 2019 at 3:07 PM Faraz Hussain <info at feacluster.com> wrote:
> Thanks, yes I have installed those libraries. See below. Initially I
> installed the libraries via yum. But then I tried installing the rpms
> directly from Mellanox website (
> MLNX_OFED_LINUX-4.5-220.127.116.11-rhel7.5-x86_64.tar ). Even after doing
> that, I still got the same error with openmpi. I will try your
> suggestion of building openmpi from source next!
> root at lustwzb34:/root # yum list | grep ibverbs
> libibverbs.x86_64 41mlnx1-OFED.18.104.22.168.0.45101
> libibverbs-devel.x86_64 41mlnx1-OFED.22.214.171.124.0.45101
> libibverbs-devel-static.x86_64 41mlnx1-OFED.126.96.36.199.0.45101
> libibverbs-utils.x86_64 41mlnx1-OFED.188.8.131.52.0.45101
> libibverbs.i686 17.2-3.el7
> libibverbs-devel.i686 1.2.1-1.el7
> root at lustwzb34:/root # lsmod | grep ib
> ib_ucm 22602 0
> ib_ipoib 168425 0
> ib_cm 53141 3 rdma_cm,ib_ucm,ib_ipoib
> ib_umad 22093 0
> mlx5_ib 339961 0
> ib_uverbs 121821 3 mlx5_ib,ib_ucm,rdma_ucm
> mlx5_core 919178 2 mlx5_ib,mlx5_fpga_tools
> mlx4_ib 211747 0
> ib_core 294554 10
> mlx4_core 360598 2 mlx4_en,mlx4_ib
> mlx_compat 29012 15
> devlink 42368 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
> libcrc32c 12644 3 xfs,nf_nat,nf_conntrack
> root at lustwzb34:/root #
> > Did you install libibverbs (and libibverbs-utils, for information and
> > troubleshooting)?
> > yum list |grep ibverbs
> > Are you loading the ib modules?
> > lsmod |grep ib
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf