[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
Faraz Hussain
info at feacluster.com
Tue Apr 30 07:14:16 PDT 2019
I installed RedHat 7.5 on two machines with the following Mellanox cards:
87:00.0 Network controller: Mellanox Technologies MT27520 Family
[ConnectX-3 Pro
I followed the steps outlined here to verify RDMA is working:
https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
get this error:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: lustwzb34
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Then it just hangs till I press control C.
I understand this may be an issue with RedHat, Open MPI or Mellanox.
Any ideas to debug which place it could be?
Thanks!
More information about the Beowulf
mailing list