<div dir="ltr"><div>Hi Faraz. Could to make another summary for us?</div><div>What hardware and what Infiniband switch you have</div><div>Run   these commands:      ibdiagnet   smshow</div><div><br></div><div>You originally had the OpenMPI which was provided by CentOS  ??</div><div><br></div><div>You compiled the OpenMPI from source??</div><div>How are you bringing the new OpenMPI version itno your PATH ?? Are you using modules or an mpi switcher utilioty?</div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div class="gmail_attr" dir="ltr">On Wed, 1 May 2019 at 09:39, Benson Muite <<a href="mailto:benson_muite@emailplus.org">benson_muite@emailplus.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
  
    
  
  <div bgcolor="#FFFFFF">
    <p>Hi Faraz,</p>
    <p>Have you tried any other MPI distributions (eg. MPICH, MVAPICH)?</p>
    <p>Regards,</p>
    <p>Benson<br>
    </p>
    <div class="gmail-m_1284415573582704506moz-cite-prefix">On 4/30/19 11:20 PM, Gus Correa wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div>It may be using IPoIB (TCP/IP over IB), not verbs/rdma.
              <br>
            </div>
            <div>You can force it to use openib (verbs, rdma) with
              (vader is for in-node shared memory):</div>
            <div><br>
            </div>
            <div>
              <pre class="gmail-m_1284415573582704506gmail-de1"><span class="gmail-m_1284415573582704506gmail-co4"></span><span class="gmail-m_1284415573582704506gmail-kw2">mpirun</span> <span class="gmail-m_1284415573582704506gmail-re5">--mca</span> btl openib,self,vader ...

</pre>
              <pre class="gmail-m_1284415573582704506gmail-de1">These flags may also help tell which btl (byte transport layer) is being used:

 <code>--mca btl_base_verbose 30</code></pre>
              <pre class="gmail-m_1284415573582704506gmail-de1">See these FAQ:
<a href="https://www.open-mpi.org/faq/?category=openfabrics#ib-btl" target="_blank">https://www.open-mpi.org/faq/?category=openfabrics#ib-btl</a>
<a href="https://www.open-mpi.org/faq/?category=all#tcp-routability-1.3" target="_blank">https://www.open-mpi.org/faq/?category=all#tcp-routability-1.3</a>
</pre>
              <pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">Better really ask more details in the Open MPI list. They are the pros!
</font></pre>
              <pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">My two cents,
Gus Correa
</font></pre>
              <pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">
</font></pre>
              <pre class="gmail-m_1284415573582704506gmail-de1"></pre>
            </div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div class="gmail_attr" dir="ltr">On Tue, Apr 30, 2019 at 3:57
          PM Faraz Hussain <<a href="mailto:info@feacluster.com" target="_blank">info@feacluster.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">Thanks,
          after buidling openmpi 4 from source, it now works! However
          it  <br>
          still gives this message below when I run openmpi with verbose
          setting:<br>
          <br>
          No OpenFabrics connection schemes reported that they were able
          to be<br>
          used on a specific port.  As such, the openib BTL (OpenFabrics<br>
          support) will be disabled for this port.<br>
          <br>
             Local host:           lustwzb34<br>
             Local device:         mlx4_0<br>
             Local port:           1<br>
             CPCs attempted:       rdmacm, udcm<br>
          <br>
          However, the results from my latency and bandwith tests seem
          to be  <br>
          what I would expect from infiniband. See:<br>
          <br>
          [hussaif1@lustwzb34 pt2pt]$  mpirun -v -np 2 -hostfile
          ./hostfile  <br>
          ./osu_latency<br>
          # OSU MPI Latency Test v5.3.2<br>
          # Size          Latency (us)<br>
          0                       1.87<br>
          1                       1.88<br>
          2                       1.93<br>
          4                       1.92<br>
          8                       1.93<br>
          16                      1.95<br>
          32                      1.93<br>
          64                      2.08<br>
          128                     2.61<br>
          256                     2.72<br>
          512                     2.93<br>
          1024                    3.33<br>
          2048                    3.81<br>
          4096                    4.71<br>
          8192                    6.68<br>
          16384                   8.38<br>
          32768                  12.13<br>
          65536                  19.74<br>
          131072                 35.08<br>
          262144                 64.67<br>
          524288                122.11<br>
          1048576               236.69<br>
          2097152               465.97<br>
          4194304               926.31<br>
          <br>
          [hussaif1@lustwzb34 pt2pt]$  mpirun -v -np 2 -hostfile
          ./hostfile ./osu_bw<br>
          # OSU MPI Bandwidth Test v5.3.2<br>
          # Size      Bandwidth (MB/s)<br>
          1                       3.09<br>
          2                       6.35<br>
          4                      12.77<br>
          8                      26.01<br>
          16                     51.31<br>
          32                    103.08<br>
          64                    197.89<br>
          128                   362.00<br>
          256                   676.28<br>
          512                  1096.26<br>
          1024                 1819.25<br>
          2048                 2551.41<br>
          4096                 3886.63<br>
          8192                 3983.17<br>
          16384                4362.30<br>
          32768                4457.09<br>
          65536                4502.41<br>
          131072               4512.64<br>
          262144               4531.48<br>
          524288               4537.42<br>
          1048576              4510.69<br>
          2097152              4546.64<br>
          4194304              4565.12<br>
          <br>
          When I run ibv_devinfo I get:<br>
          <br>
          [hussaif1@lustwzb34 pt2pt]$ ibv_devinfo<br>
          hca_id: mlx4_0<br>
                   transport:                      InfiniBand (0)<br>
                   fw_ver:                         2.36.5000<br>
                   node_guid:                      480f:cfff:fff5:c6c0<br>
                   sys_image_guid:                 480f:cfff:fff5:c6c3<br>
                   vendor_id:                      0x02c9<br>
                   vendor_part_id:                 4103<br>
                   hw_ver:                         0x0<br>
                   board_id:                       HP_1360110017<br>
                   phys_port_cnt:                  2<br>
                   Device ports:<br>
                           port:   1<br>
                                   state:                  PORT_ACTIVE
          (4)<br>
                                   max_mtu:                4096 (5)<br>
                                   active_mtu:             1024 (3)<br>
                                   sm_lid:                 0<br>
                                   port_lid:               0<br>
                                   port_lmc:               0x00<br>
                                   link_layer:             Ethernet<br>
          <br>
                           port:   2<br>
                                   state:                  PORT_DOWN (1)<br>
                                   max_mtu:                4096 (5)<br>
                                   active_mtu:             1024 (3)<br>
                                   sm_lid:                 0<br>
                                   port_lid:               0<br>
                                   port_lmc:               0x00<br>
                                   link_layer:             Ethernet<br>
          <br>
          I will ask the openmpi mailing list if my results make sense?!<br>
          <br>
          <br>
          Quoting Gus Correa <<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>>:<br>
          <br>
          > Hi Faraz<br>
          ><br>
          > By all means, download the Open MPI tarball and build
          from source.<br>
          > Otherwise there won't be support for IB (the CentOS Open
          MPI packages most<br>
          > likely rely only on TCP/IP).<br>
          ><br>
          > Read their README file (it comes in the tarball), and
          take a careful look<br>
          > at their (excellent) FAQ:<br>
          > <a href="https://www.open-mpi.org/faq/" target="_blank" rel="noreferrer">https://www.open-mpi.org/faq/</a><br>
          > Many issues can be solved by just reading these two
          resources.<br>
          ><br>
          > If you hit more trouble, subscribe to the Open MPI
          mailing list, and ask<br>
          > questions there,<br>
          > because you will get advice directly from the Open MPI
          developers, and the<br>
          > fix will come easy.<br>
          > <a href="https://www.open-mpi.org/community/lists/ompi.php" target="_blank" rel="noreferrer">https://www.open-mpi.org/community/lists/ompi.php</a><br>
          ><br>
          > My two cents,<br>
          > Gus Correa<br>
          ><br>
          > On Tue, Apr 30, 2019 at 3:07 PM Faraz Hussain <<a href="mailto:info@feacluster.com" target="_blank">info@feacluster.com</a>> wrote:<br>
          ><br>
          >> Thanks, yes I have installed those libraries. See
          below. Initially I<br>
          >> installed the libraries via yum. But then I tried
          installing the rpms<br>
          >> directly from Mellanox website (<br>
          >> MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tar ).
          Even after doing<br>
          >> that, I still got the same error with openmpi. I will
          try your<br>
          >> suggestion of building openmpi from source next!<br>
          >><br>
          >> root@lustwzb34:/root # yum list | grep ibverbs<br>
          >> libibverbs.x86_64                   
           41mlnx1-OFED.4.5.0.1.0.45101<br>
          >> libibverbs-devel.x86_64             
           41mlnx1-OFED.4.5.0.1.0.45101<br>
          >> libibverbs-devel-static.x86_64       
          41mlnx1-OFED.4.5.0.1.0.45101<br>
          >> libibverbs-utils.x86_64             
           41mlnx1-OFED.4.5.0.1.0.45101<br>
          >> libibverbs.i686                       17.2-3.el7<br>
          >> rhel-7-server-rpms<br>
          >> libibverbs-devel.i686                 1.2.1-1.el7<br>
          >> rhel-7-server-rpms<br>
          >><br>
          >> root@lustwzb34:/root # lsmod | grep ib<br>
          >> ib_ucm                 22602  0<br>
          >> ib_ipoib              168425  0<br>
          >> ib_cm                  53141  3
          rdma_cm,ib_ucm,ib_ipoib<br>
          >> ib_umad                22093  0<br>
          >> mlx5_ib               339961  0<br>
          >> ib_uverbs             121821  3
          mlx5_ib,ib_ucm,rdma_ucm<br>
          >> mlx5_core             919178  2
          mlx5_ib,mlx5_fpga_tools<br>
          >> mlx4_ib               211747  0<br>
          >> ib_core               294554  10<br>
          >><br>
          >>
rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib<br>
          >> mlx4_core             360598  2 mlx4_en,mlx4_ib<br>
          >> mlx_compat             29012  15<br>
          >><br>
          >>
rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib<br>
          >> devlink                42368  4
          mlx4_en,mlx4_ib,mlx4_core,mlx5_core<br>
          >> libcrc32c              12644  3
          xfs,nf_nat,nf_conntrack<br>
          >> root@lustwzb34:/root #<br>
          >><br>
          >><br>
          >><br>
          >> > Did you install libibverbs  (and
          libibverbs-utils, for information and<br>
          >> > troubleshooting)?<br>
          >><br>
          >> > yum list |grep ibverbs<br>
          >><br>
          >> > Are you loading the ib modules?<br>
          >><br>
          >> > lsmod |grep ib<br>
          >><br>
          >><br>
          <br>
          <br>
          <br>
        </blockquote>
      </div>
      <br>
      <fieldset class="gmail-m_1284415573582704506mimeAttachmentHeader"></fieldset>
      <pre class="gmail-m_1284415573582704506moz-quote-pre">_______________________________________________
Beowulf mailing list, <a class="gmail-m_1284415573582704506moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit <a class="gmail-m_1284415573582704506moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>
</pre>
    </blockquote>
  </div>

_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank" rel="noreferrer">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>