<div dir="ltr"><div>Hi Faraz. Could to make another summary for us?</div><div>What hardware and what Infiniband switch you have</div><div>Run these commands: ibdiagnet smshow</div><div><br></div><div>You originally had the OpenMPI which was provided by CentOS ??</div><div><br></div><div>You compiled the OpenMPI from source??</div><div>How are you bringing the new OpenMPI version itno your PATH ?? Are you using modules or an mpi switcher utilioty?</div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div class="gmail_attr" dir="ltr">On Wed, 1 May 2019 at 09:39, Benson Muite <<a href="mailto:benson_muite@emailplus.org">benson_muite@emailplus.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
<div bgcolor="#FFFFFF">
<p>Hi Faraz,</p>
<p>Have you tried any other MPI distributions (eg. MPICH, MVAPICH)?</p>
<p>Regards,</p>
<p>Benson<br>
</p>
<div class="gmail-m_1284415573582704506moz-cite-prefix">On 4/30/19 11:20 PM, Gus Correa wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>It may be using IPoIB (TCP/IP over IB), not verbs/rdma.
<br>
</div>
<div>You can force it to use openib (verbs, rdma) with
(vader is for in-node shared memory):</div>
<div><br>
</div>
<div>
<pre class="gmail-m_1284415573582704506gmail-de1"><span class="gmail-m_1284415573582704506gmail-co4"></span><span class="gmail-m_1284415573582704506gmail-kw2">mpirun</span> <span class="gmail-m_1284415573582704506gmail-re5">--mca</span> btl openib,self,vader ...
</pre>
<pre class="gmail-m_1284415573582704506gmail-de1">These flags may also help tell which btl (byte transport layer) is being used:
<code>--mca btl_base_verbose 30</code></pre>
<pre class="gmail-m_1284415573582704506gmail-de1">See these FAQ:
<a href="https://www.open-mpi.org/faq/?category=openfabrics#ib-btl" target="_blank">https://www.open-mpi.org/faq/?category=openfabrics#ib-btl</a>
<a href="https://www.open-mpi.org/faq/?category=all#tcp-routability-1.3" target="_blank">https://www.open-mpi.org/faq/?category=all#tcp-routability-1.3</a>
</pre>
<pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">Better really ask more details in the Open MPI list. They are the pros!
</font></pre>
<pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">My two cents,
Gus Correa
</font></pre>
<pre class="gmail-m_1284415573582704506gmail-de1"><font face="arial,helvetica,sans-serif">
</font></pre>
<pre class="gmail-m_1284415573582704506gmail-de1"></pre>
</div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div class="gmail_attr" dir="ltr">On Tue, Apr 30, 2019 at 3:57
PM Faraz Hussain <<a href="mailto:info@feacluster.com" target="_blank">info@feacluster.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">Thanks,
after buidling openmpi 4 from source, it now works! However
it <br>
still gives this message below when I run openmpi with verbose
setting:<br>
<br>
No OpenFabrics connection schemes reported that they were able
to be<br>
used on a specific port. As such, the openib BTL (OpenFabrics<br>
support) will be disabled for this port.<br>
<br>
Local host: lustwzb34<br>
Local device: mlx4_0<br>
Local port: 1<br>
CPCs attempted: rdmacm, udcm<br>
<br>
However, the results from my latency and bandwith tests seem
to be <br>
what I would expect from infiniband. See:<br>
<br>
[hussaif1@lustwzb34 pt2pt]$ mpirun -v -np 2 -hostfile
./hostfile <br>
./osu_latency<br>
# OSU MPI Latency Test v5.3.2<br>
# Size Latency (us)<br>
0 1.87<br>
1 1.88<br>
2 1.93<br>
4 1.92<br>
8 1.93<br>
16 1.95<br>
32 1.93<br>
64 2.08<br>
128 2.61<br>
256 2.72<br>
512 2.93<br>
1024 3.33<br>
2048 3.81<br>
4096 4.71<br>
8192 6.68<br>
16384 8.38<br>
32768 12.13<br>
65536 19.74<br>
131072 35.08<br>
262144 64.67<br>
524288 122.11<br>
1048576 236.69<br>
2097152 465.97<br>
4194304 926.31<br>
<br>
[hussaif1@lustwzb34 pt2pt]$ mpirun -v -np 2 -hostfile
./hostfile ./osu_bw<br>
# OSU MPI Bandwidth Test v5.3.2<br>
# Size Bandwidth (MB/s)<br>
1 3.09<br>
2 6.35<br>
4 12.77<br>
8 26.01<br>
16 51.31<br>
32 103.08<br>
64 197.89<br>
128 362.00<br>
256 676.28<br>
512 1096.26<br>
1024 1819.25<br>
2048 2551.41<br>
4096 3886.63<br>
8192 3983.17<br>
16384 4362.30<br>
32768 4457.09<br>
65536 4502.41<br>
131072 4512.64<br>
262144 4531.48<br>
524288 4537.42<br>
1048576 4510.69<br>
2097152 4546.64<br>
4194304 4565.12<br>
<br>
When I run ibv_devinfo I get:<br>
<br>
[hussaif1@lustwzb34 pt2pt]$ ibv_devinfo<br>
hca_id: mlx4_0<br>
transport: InfiniBand (0)<br>
fw_ver: 2.36.5000<br>
node_guid: 480f:cfff:fff5:c6c0<br>
sys_image_guid: 480f:cfff:fff5:c6c3<br>
vendor_id: 0x02c9<br>
vendor_part_id: 4103<br>
hw_ver: 0x0<br>
board_id: HP_1360110017<br>
phys_port_cnt: 2<br>
Device ports:<br>
port: 1<br>
state: PORT_ACTIVE
(4)<br>
max_mtu: 4096 (5)<br>
active_mtu: 1024 (3)<br>
sm_lid: 0<br>
port_lid: 0<br>
port_lmc: 0x00<br>
link_layer: Ethernet<br>
<br>
port: 2<br>
state: PORT_DOWN (1)<br>
max_mtu: 4096 (5)<br>
active_mtu: 1024 (3)<br>
sm_lid: 0<br>
port_lid: 0<br>
port_lmc: 0x00<br>
link_layer: Ethernet<br>
<br>
I will ask the openmpi mailing list if my results make sense?!<br>
<br>
<br>
Quoting Gus Correa <<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>>:<br>
<br>
> Hi Faraz<br>
><br>
> By all means, download the Open MPI tarball and build
from source.<br>
> Otherwise there won't be support for IB (the CentOS Open
MPI packages most<br>
> likely rely only on TCP/IP).<br>
><br>
> Read their README file (it comes in the tarball), and
take a careful look<br>
> at their (excellent) FAQ:<br>
> <a href="https://www.open-mpi.org/faq/" target="_blank" rel="noreferrer">https://www.open-mpi.org/faq/</a><br>
> Many issues can be solved by just reading these two
resources.<br>
><br>
> If you hit more trouble, subscribe to the Open MPI
mailing list, and ask<br>
> questions there,<br>
> because you will get advice directly from the Open MPI
developers, and the<br>
> fix will come easy.<br>
> <a href="https://www.open-mpi.org/community/lists/ompi.php" target="_blank" rel="noreferrer">https://www.open-mpi.org/community/lists/ompi.php</a><br>
><br>
> My two cents,<br>
> Gus Correa<br>
><br>
> On Tue, Apr 30, 2019 at 3:07 PM Faraz Hussain <<a href="mailto:info@feacluster.com" target="_blank">info@feacluster.com</a>> wrote:<br>
><br>
>> Thanks, yes I have installed those libraries. See
below. Initially I<br>
>> installed the libraries via yum. But then I tried
installing the rpms<br>
>> directly from Mellanox website (<br>
>> MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tar ).
Even after doing<br>
>> that, I still got the same error with openmpi. I will
try your<br>
>> suggestion of building openmpi from source next!<br>
>><br>
>> root@lustwzb34:/root # yum list | grep ibverbs<br>
>> libibverbs.x86_64
41mlnx1-OFED.4.5.0.1.0.45101<br>
>> libibverbs-devel.x86_64
41mlnx1-OFED.4.5.0.1.0.45101<br>
>> libibverbs-devel-static.x86_64
41mlnx1-OFED.4.5.0.1.0.45101<br>
>> libibverbs-utils.x86_64
41mlnx1-OFED.4.5.0.1.0.45101<br>
>> libibverbs.i686 17.2-3.el7<br>
>> rhel-7-server-rpms<br>
>> libibverbs-devel.i686 1.2.1-1.el7<br>
>> rhel-7-server-rpms<br>
>><br>
>> root@lustwzb34:/root # lsmod | grep ib<br>
>> ib_ucm 22602 0<br>
>> ib_ipoib 168425 0<br>
>> ib_cm 53141 3
rdma_cm,ib_ucm,ib_ipoib<br>
>> ib_umad 22093 0<br>
>> mlx5_ib 339961 0<br>
>> ib_uverbs 121821 3
mlx5_ib,ib_ucm,rdma_ucm<br>
>> mlx5_core 919178 2
mlx5_ib,mlx5_fpga_tools<br>
>> mlx4_ib 211747 0<br>
>> ib_core 294554 10<br>
>><br>
>>
rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib<br>
>> mlx4_core 360598 2 mlx4_en,mlx4_ib<br>
>> mlx_compat 29012 15<br>
>><br>
>>
rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib<br>
>> devlink 42368 4
mlx4_en,mlx4_ib,mlx4_core,mlx5_core<br>
>> libcrc32c 12644 3
xfs,nf_nat,nf_conntrack<br>
>> root@lustwzb34:/root #<br>
>><br>
>><br>
>><br>
>> > Did you install libibverbs (and
libibverbs-utils, for information and<br>
>> > troubleshooting)?<br>
>><br>
>> > yum list |grep ibverbs<br>
>><br>
>> > Are you loading the ib modules?<br>
>><br>
>> > lsmod |grep ib<br>
>><br>
>><br>
<br>
<br>
<br>
</blockquote>
</div>
<br>
<fieldset class="gmail-m_1284415573582704506mimeAttachmentHeader"></fieldset>
<pre class="gmail-m_1284415573582704506moz-quote-pre">_______________________________________________
Beowulf mailing list, <a class="gmail-m_1284415573582704506moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit <a class="gmail-m_1284415573582704506moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank" rel="noreferrer">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>