<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body>
<div>
<div>
<div>
<div style="direction: ltr;">Hi John,</div>
<div><br>
</div>
<div style="direction: ltr;">I think there is a bit of an inaccuracy given you mention HP. What I have learned as I am working with a local HP and HPE distributor that for servers and everything you want to deal with HPE (HP enterprise) where as standard consumer
hardware is bought from HP as they have two distinct companies focused on different market segments.</div>
<div><br>
</div>
<div style="direction: ltr;">In terms of cluster with HP servers has anyone spoken or deal with HPE support for this kind of stuff?</div>
<div><br>
</div>
<div style="direction: ltr;">Regards,</div>
<div style="direction: ltr;">Jonathan</div>
</div>
<div><br>
</div>
<div class="ms-outlook-ios-signature">
<div style="direction: ltr;">Regards,</div>
<div style="direction: ltr;">Jonathan Aquilina</div>
<div style="direction: ltr;">Owner EagleEyeT</div>
</div>
</div>
<div> </div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="dir="ltr""><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Beowulf <beowulf-bounces@beowulf.org> on behalf of John Hearns via Beowulf <beowulf@beowulf.org><br>
<b>Sent:</b> Thursday, May 2, 2019 6:03 PM<br>
<b>To:</b> Faraz Hussain<br>
<b>Cc:</b> Beowulf Mailing List; Christopher Samuel<br>
<b>Subject:</b> Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
<div> </div>
</font></div>
<meta content="text/html; charset=utf-8">
<div dir="ltr">
<div>You ask some damned good questions there.</div>
<div>I will try to answer them from the point of view of someone who has worked as an HPC systems integrator and supported HPC systems,</div>
<div>both for systems integrators and within companies.</div>
<div><br>
</div>
<div>We will start with HP. Did you buy those systems direct from HP as servers, or did you buy a configured HPC system, </div>
<div>complete with Infiniband networking and with a software stack?</div>
<div>If you bought bare metal servers then you are out of luck regarding support, other than hardware failures.</div>
<div>HP now incorporate SGI, and their support is fantastic. Great people work for HP and SGI. But they aren't responsible for your install.</div>
<div><br>
</div>
<div>If however you bought an integrated HPC system this will normally be integrated by a smaller company, usually in your country.</div>
<div>Is this the case here? Then yes the integrator should be providing support.</div>
<div>HOWEVER you have elected to remove their installed OS and upgrade by yourself. If I was the integrator I would give advice,</div>
<div>but refuse to support the upgrade unless it was recommended by us, and you have a continuing support contract.</div>
<div><br>
</div>
<div>You are using CentOS. The CentOS team are great guys - I know the founder quite well, and know people who work for RedHat.</div>
<div>You have chosen CentOS - Community Supported Operating System. Join the CentOS HPC SIG perhaps and ask for help.</div>
<div>But you don't get support from RedHat - as you are not using Redhat Enterprise Linux.</div>
<div><br>
</div>
<div>Now we come to Mellanox. Mellanox support is fantastic. Formally, to open a support ticket with them you will need a support agreement</div>
<div>on your switch. You HAVE got a support agreement - right?</div>
<div>If not I have found that informal requests for support are often answered by Mellanox support.</div>
<div><br>
</div>
<div>Failing all of those you could hire me!</div>
<div>(I am being semi-serious here - I am a permanent employee at the moment, but I have worked as an HPC contractor int he past,</div>
<div>and if I could justify it I would prefer to do HPC support on a contract basis).</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div class="gmail_attr" dir="ltr">On Thu, 2 May 2019 at 16:45, Faraz Hussain <<a href="mailto:info@feacluster.com">info@feacluster.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; padding-left:1ex; border-left-color:rgb(204,204,204); border-left-width:1px; border-left-style:solid">
Thanks. Before I go down the path of installing things willy-nilly, is <br>
there some guide I should be following instead? I obviously have a <br>
problem with my mellanox drivers combined with "user error"..<br>
<br>
So should I be paying Mellanox to help? Or is it a RedHat issue? Or is <br>
it our harware vendor, HP who should be involved??<br>
<br>
Looks like I need support on how to get support :-)<br>
<br>
<br>
Quoting Christopher Samuel <<a href="mailto:chris@csamuel.org" target="_blank">chris@csamuel.org</a>>:<br>
<br>
>> root@lustwzb34:/root # systemctl status rdma<br>
>> Unit rdma.service could not be found.<br>
><br>
> You're missing this RPM then, which might explain a lot:<br>
><br>
> $ rpm -qi rdma-core<br>
> Name : rdma-core<br>
> Version : 17.2<br>
> Release : 3.el7<br>
> Architecture: x86_64<br>
> Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT<br>
> Group : Unspecified<br>
> Size : 107924<br>
> License : GPLv2 or BSD<br>
> Signature : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID <br>
> 24c6a8a7f4a80eb5<br>
> Source RPM : rdma-core-17.2-3.el7.src.rpm<br>
> Build Date : Wed 31 Oct 2018 07:10:24 AM AEDT<br>
> Build Host : <a href="http://x86-01.bsys.centos.org" target="_blank" rel="noreferrer">
x86-01.bsys.centos.org</a><br>
> Relocations : (not relocatable)<br>
> Packager : CentOS BuildSystem <<a href="http://bugs.centos.org" target="_blank" rel="noreferrer">http://bugs.centos.org</a>><br>
> Vendor : CentOS<br>
> URL : <a href="https://github.com/linux-rdma/rdma-core" target="_blank" rel="noreferrer">
https://github.com/linux-rdma/rdma-core</a><br>
> Summary : RDMA core userspace libraries and daemons<br>
> Description :<br>
> RDMA core userspace infrastructure and documentation, including initscripts,<br>
> kernel driver-specific modprobe override configs, IPoIB network scripts,<br>
> dracut rules, and the rdma-ndd utility.<br>
><br>
> -- <br>
> Chris Samuel : <a href="http://www.csamuel.org/" target="_blank" rel="noreferrer">
http://www.csamuel.org/</a> : Berkeley, CA, USA<br>
> _______________________________________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit <br>
> <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank" rel="noreferrer">
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
<br>
<br>
<br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank" rel="noreferrer">
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote>
</div>
</div>
</body>
</html>