[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Benson Muite benson_muite at emailplus.org
Thu May 2 10:35:20 PDT 2019


Hi Faraz,

Mellanox manuals can be found at:

https://docs.mellanox.com/

Example setup instructions (not sure if correct for you as do not have 
exact details on your hardware):

https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v4_3.pdf

Maybe also helpful (students who have participated in cluster 
competitions are usually quite good at setting these up):

https://www.slothparadise.com/setting-infiniband-centos-6-7/

If you will be primarily running finite element software,  time 
investment now for understanding performance analysis can pay off in 
future - Disclosure I have an interest in seeing more performance tests 
on your hardware.

On 5/2/19 6:40 PM, Faraz Hussain wrote:
> Thanks. Before I go down the path of installing things willy-nilly, is 
> there some guide I should be following instead? I obviously have a 
> problem with my mellanox drivers combined with "user error"..
>
> So should I be paying Mellanox to help? Or is it a RedHat issue? Or is 
> it our harware vendor, HP who should be involved??
>
> Looks like I need support on how to get support :-)
>
>
> Quoting Christopher Samuel <chris at csamuel.org>:
>
>>> root at lustwzb34:/root # systemctl status rdma
>>> Unit rdma.service could not be found.
>>
>> You're missing this RPM then, which might explain a lot:
>>
>> $ rpm -qi rdma-core
>> Name        : rdma-core
>> Version     : 17.2
>> Release     : 3.el7
>> Architecture: x86_64
>> Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT
>> Group       : Unspecified
>> Size        : 107924
>> License     : GPLv2 or BSD
>> Signature   : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID 
>> 24c6a8a7f4a80eb5
>> Source RPM  : rdma-core-17.2-3.el7.src.rpm
>> Build Date  : Wed 31 Oct 2018 07:10:24 AM AEDT
>> Build Host  : x86-01.bsys.centos.org
>> Relocations : (not relocatable)
>> Packager    : CentOS BuildSystem <http://bugs.centos.org>
>> Vendor      : CentOS
>> URL         : https://github.com/linux-rdma/rdma-core
>> Summary     : RDMA core userspace libraries and daemons
>> Description :
>> RDMA core userspace infrastructure and documentation, including 
>> initscripts,
>> kernel driver-specific modprobe override configs, IPoIB network scripts,
>> dracut rules, and the rdma-ndd utility.
>>
>> -- 
>>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


More information about the Beowulf mailing list