[Beowulf] Build Recommendations - Private Cluster
Richard Edwards
ejb at fastmail.fm
Wed Aug 21 15:00:41 PDT 2019
Hi Everyone
Thank you all for the feedback and insights.
So I am starting to see a pattern. Some combination of CentOS + Ansible + OpenHPC + SLURM + Old CUDA/Nvidia Drivers ;-).
Sean thank you for those links they will certainly accelerate the journey. (Note to anyone looking you need to remove the “:” at the end of the link else you will get a 404)
Finally, yes I am very aware that the hardware is long in the tooth but it is what I have for the time being. Once my capability out strips the capability of the hardware then I am bound to upgrade. At that point I plan to have a manageable cluster that I can add/remove/upgrade at will :-).
Thanks again to everyone for the responses and insights. Will let you all know how I go over the coming weeks.
Cheers
Richard
> On 22 Aug 2019, at 1:26 am, Sean McGrath <smcgrat at tchpc.tcd.ie> wrote:
>
> Hi guys,
>
> I was on the Programme Committee for the HPC Systems Professionals
> Workshop, HPCSYSPROS18 at Super Computing last year,
> http://sighpc-syspros.org/workshops/2018/index.php.html.
>
> A couple of the submissions I reviewed may be of interest here.
>
> (1) Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using
> OpenHPC playbooks.
>
> This was presented. It is essentially a set of ansible playbooks to get
> a cluster up and running as quickly as possible
>
> From their github, https://github.com/XSEDE/CRI_XCBC:
>
> "This repo will get you to the point of a working slurm installation
> across your cluster. It does not currently provide any scientific
> software or user management options!
>
> The basic usage is to set up the master node with the initial 3 roles
> (pre_ohpc,ohpc_install,ohpc_config) and use the rest to build node
> images, and deploy the actual nodes (these use Warewulf as a
> provisioner by default)."
>
> (2) clusterworks - was not presented at HPCSYSPROS18, it lost out to
> the above marginally but is very similar to the first one above. From
> their https://github.com/clusterworks/inception:
>
> "clusterworks is a toolkit that brings together the best modern
> technologies in order to create fast and flexible turn-key HPC
> environments, deployable on bare-metal infrastructure or in the cloud"
>
> They may be of some use here. Instead of having to start everything
> from scratch you can build on top of those foundations. I don't know
> how current those projects are or if they are still being developed
> though.
>
> Sean
>
>
> On Wed, Aug 21, 2019 at 10:27:41AM -0400, Alexander Antoniades wrote:
>
>> We have been building out a cluster based on commodity servers (mainly
>> Gigabyte motherboards) with 8x1080ti/2080ti per server.
>>
>> We are using a combination of OpenHPC compiled tools and Ansible. I would
>> recommend using the OpenHPC software so you don't have to deal with
>> figuring out what versions of the tools you need to get and manually
>> building them, but I would not go down their prescribed way for building a
>> cluster with base images and all for a small heterogeneous cluster. I would
>> just build the machines as consistently as they can and then use the
>> OpenHPC versions of programs where needed and augment the management with
>> something like ansible or even pdsh.
>>
>> Also unless you're really just doing this an exercise to kill time on
>> weekends, or you literally have no money and can get free power/cooling, I
>> would really consider looking at what other more modern hardware is
>> available, or at least benchmark your system against a sample cloud system
>> if you really want to learn GPU computing.
>>
>> Thanks,
>>
>> Sander
>>
>> On Wed, Aug 21, 2019 at 1:56 AM Richard Edwards <ejb at fastmail.fm> wrote:
>>
>>> Hi John
>>>
>>> No doom and gloom.
>>>
>>> It's in a purpose built workshop/computer room that I have; 42U Rack,
>>> cross draft cooling which is sufficient and 32AMP Power into the PDU???s. The
>>> equipment is housed in the 42U Rack along with a variety of other machines
>>> such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it
>>> runs 24/7 and not all of it is on at the same time, mainly because of the
>>> cost of power :-/
>>>
>>> Yeah the Tesla 1070???s scream like a banshee???..
>>>
>>> I am planning on running it as power on, on demand setup, which I already
>>> do through some HP iLo and APC PDU Scripts that I have for these machines.
>>> Until recently I have been running some of them as a vSphere cluster and
>>> others as standalone CUDA machines.
>>>
>>> So that???s one vote for OpenHPC.
>>>
>>> Cheers
>>>
>>> Richard
>>>
>>> On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf <beowulf at beowulf.org>
>>> wrote:
>>>
>>> Add up the power consumption for each of those servers. If you plan on
>>> installing this in a domestic house or indeed in a normal office
>>> environment you probably wont have enough amperage in the circuit you
>>> intend to power it from.
>>> Sorry to be all doom and gloom.
>>> Also this setup will make a great deal of noise. If in a domestic setting
>>> put it in the garage.
>>> In an office setting the obvious place is a comms room but be careful
>>> about the ventilation.
>>> Office comms rooms often have a single wall mounted air conditioning unit.
>>> Make SURE to run a temperature shutdown script.
>>> This air con unit WILL fail over a weekend.
>>>
>>> Regarding the software stack I would look at OpenHPC. But that's just me.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov <dmitri.chubarov at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> this is a very old hardware and you would have to stay with a very
>>>> outdated software stack as 1070 cards are not supported by the recent
>>>> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play
>>>> well with modern kernels and modern system libraries.Unless you are doing
>>>> this for digital preservation, consider dropping 1070s out of the equation.
>>>>
>>>> Dmitri
>>>>
>>>>
>>>> On Wed, 21 Aug 2019 at 06:46, Richard Edwards <ejb at fastmail.fm> wrote:
>>>>
>>>>> Hi Folks
>>>>>
>>>>> So about to build a new personal GPU enabled cluster and am looking for
>>>>> peoples thoughts on distribution and management tools.
>>>>>
>>>>> Hardware that I have available for the build
>>>>> - HP Proliant DL380/360 - mix of G5/G6
>>>>> - HP Proliant SL6500 with 8 GPU
>>>>> - HP Proliant DL580 - G7 + 2x K20x GPU
>>>>> -3x Nvidia Tesla 1070 (4 GPU per unit)
>>>>>
>>>>> Appreciate people insights/thoughts
>>>>>
>>>>> Regards
>>>>>
>>>>> Richard
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> --
> Sean McGrath M.Sc
>
> Systems Administrator
> Trinity Centre for High Performance and Research Computing
> Trinity College Dublin
>
> sean.mcgrath at tchpc.tcd.ie
>
> https://www.tcd.ie/
> https://www.tchpc.tcd.ie/
>
> +353 (0) 1 896 3725
>
More information about the Beowulf
mailing list