[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

John Hearns hearnsj at googlemail.com
Thu May 2 09:18:25 PDT 2019


Chris, I have to say this. I have worked for smaller companies, and have
worked for cluster integrators.
For big University sized and national labs the procurement exercise will
end up with a well defined support arrangement.

I have seen, in once company I worked at, an HPC system arrive which I was
not responsible for.
This system was purchased by the IT department, and was intended to run
Finite Element software.
The hardware came from a Tier 1 vendor, but it was integrated by a small
systems integrator.
Yes, they installed a software stack and demonstrated that it would run
Abaqus.
But beyond that there was no support for getting other applications
running. And no training that I could see in diagnosing faults.

I am not going to name names, but I suspect experiences like that are
common.
Companies want to procure kit for as little as possible. Tier 1 vendors and
white box vendors want to make the sales.
But no-one wants to pay for Bright Cluster Manager, for example.
So the end user gets at best a freeware solution like Rocks, or at worst
some Kickstarted setup which installs an OS,
the CentOS supplied IB drivers and MPI, and Gridengine slapped on top of
that.

This leads to an unsatisfying experience on the part of the end users, and
also for the engineers of the integrating company.

Which leads me to say that we see the rise of HPC in the cloud services-
AWS,  OnScale, Rescale, Verne Global etc. etc.
And no wonder - you should be getting a much more polished and ready to go
infrastructure, even though you cant physically touch it.













On Thu, 2 May 2019 at 17:08, Christopher Samuel <chris at csamuel.org> wrote:

> On 5/2/19 8:40 AM, Faraz Hussain wrote:
>
> > So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
> > it our harware vendor, HP who should be involved??
>
> I suspect that would be set out in the contract for the HP system.
>
> The clusters I've been involved in purchasing in the past have always
> required support requests to go via the immediate vendor and they then
> arrange to put you in contact with others where required.
>
> All the best,
> Chris
> --
>    Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190502/a5950007/attachment-0001.html>


More information about the Beowulf mailing list