[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
Benson Muite
benson_muite at emailplus.org
Thu May 2 10:35:20 PDT 2019
Hi Faraz,
Mellanox manuals can be found at:
https://docs.mellanox.com/
Example setup instructions (not sure if correct for you as do not have
exact details on your hardware):
https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v4_3.pdf
Maybe also helpful (students who have participated in cluster
competitions are usually quite good at setting these up):
https://www.slothparadise.com/setting-infiniband-centos-6-7/
If you will be primarily running finite element software, time
investment now for understanding performance analysis can pay off in
future - Disclosure I have an interest in seeing more performance tests
on your hardware.
On 5/2/19 6:40 PM, Faraz Hussain wrote:
> Thanks. Before I go down the path of installing things willy-nilly, is
> there some guide I should be following instead? I obviously have a
> problem with my mellanox drivers combined with "user error"..
>
> So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
> it our harware vendor, HP who should be involved??
>
> Looks like I need support on how to get support :-)
>
>
> Quoting Christopher Samuel <chris at csamuel.org>:
>
>>> root at lustwzb34:/root # systemctl status rdma
>>> Unit rdma.service could not be found.
>>
>> You're missing this RPM then, which might explain a lot:
>>
>> $ rpm -qi rdma-core
>> Name : rdma-core
>> Version : 17.2
>> Release : 3.el7
>> Architecture: x86_64
>> Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT
>> Group : Unspecified
>> Size : 107924
>> License : GPLv2 or BSD
>> Signature : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID
>> 24c6a8a7f4a80eb5
>> Source RPM : rdma-core-17.2-3.el7.src.rpm
>> Build Date : Wed 31 Oct 2018 07:10:24 AM AEDT
>> Build Host : x86-01.bsys.centos.org
>> Relocations : (not relocatable)
>> Packager : CentOS BuildSystem <http://bugs.centos.org>
>> Vendor : CentOS
>> URL : https://github.com/linux-rdma/rdma-core
>> Summary : RDMA core userspace libraries and daemons
>> Description :
>> RDMA core userspace infrastructure and documentation, including
>> initscripts,
>> kernel driver-specific modprobe override configs, IPoIB network scripts,
>> dracut rules, and the rdma-ndd utility.
>>
>> --
>> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
More information about the Beowulf
mailing list