[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Faraz Hussain info at feacluster.com
Tue Apr 30 07:14:16 PDT 2019


I installed RedHat 7.5 on two machines with the following Mellanox cards:

87:00.0 Network controller: Mellanox Technologies MT27520 Family  
[ConnectX-3 Pro

I followed the steps outlined here to verify RDMA is working:

https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel

However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I  
get this error:

--------------------------------------------------------------------------

No OpenFabrics connection schemes reported that they were able to be

used on a specific port. As such, the openib BTL (OpenFabrics

support) will be disabled for this port.


  Local host:      lustwzb34

  Local device:     mlx4_0

  Local port:      1

  CPCs attempted:    rdmacm, udcm

--------------------------------------------------------------------------

Then it just hangs till I press control C.

I understand this may be an issue with RedHat,  Open MPI or Mellanox.  
Any ideas to debug which place it could be?

Thanks!



More information about the Beowulf mailing list