[Beowulf] cluster deployment and config management

Douglas Eadline deadline at eadline.org
Tue Sep 5 14:26:30 PDT 2017


> Hey everyone, .. any idea what happened with perceus?
> http://www.linux-mag.com/id/6386/
> https://github.com/perceus/perceus
>
> .. yeah; what happened with Arthur Stevens (Perceus, GravityFS/OS Green
> Provisioning, etc. ) where he is now; who is maintain, if anyone, perceus
> ?


I woudl suggest you talk to Arthur Stevens.
>
> .. and come on Greg K. ... we know you are luring there somewhere being
> busy with singularity
> http://singularity.lbl.gov/ (kudos .. great job as always !!!)
> .. wasn't perceus yours original baby?

Perceus was based on Warewulf, the current warewulf is 3.6
and works quite well.


> https://gmkurtzer.github.io/
> .. can you bring some light what happened with the perceus project? ..
> I'd love to see it integrated with singularity -- that would made my
> day/month/year  !!!!!!

You can do that now.

--
Doug


>
> thanks!
> cheers,
> psc
>
> p.s. .. there there used to be rocks clusters (not sure about it's
> status these days)
> http://www.rocksclusters.org/wordpress/
>
> p.s.s. .. I'd say Warewulf is the "best" bet in most cases .. why keep
> reinventing the wheel ?
>
>
> On 09/05/2017 01:43 PM, beowulf-request at beowulf.org wrote:
>> Send Beowulf mailing list submissions to
>> 	beowulf at beowulf.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://www.beowulf.org/mailman/listinfo/beowulf
>> or, via email, send a message with subject or body 'help' to
>> 	beowulf-request at beowulf.org
>>
>> You can reach the person managing the list at
>> 	beowulf-owner at beowulf.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Beowulf digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: cluster deployment and config management (Joe Landman)
>>     2. Re: cluster deployment and config management (Arif Ali)
>>     3. RAID5 rebuild, remount with write without reboot? (mathog)
>>     4. Re: RAID5 rebuild, remount with write without reboot?
>>        (John Hearns)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 5 Sep 2017 08:20:03 -0400
>> From: Joe Landman <joe.landman at gmail.com>
>> To: beowulf at beowulf.org
>> Subject: Re: [Beowulf] cluster deployment and config management
>> Message-ID: <2da2937a-9055-6514-3d24-a739aee12845 at gmail.com>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>> Good morning ...
>>
>>
>> On 09/05/2017 01:24 AM, Stu Midgley wrote:
>>> Morning everyone
>>>
>>> I am in the process of redeveloping our cluster deployment and config
>>> management environment and wondered what others are doing?
>>>
>>> First, everything we currently have is basically home-grown.
>> Nothing wrong with this, if it adequately solves the problem.  Many of
>> the frameworks people use for these things are highly opinionated, and
>> often, you'll find their opinions grate on your expectations.  At
>> $dayjob-1, I developed our own kit precisely because so many of the
>> other toolkits did little to big things wrong; not simply from an
>> opinion point of view, but actively made specific errors that the
>> developers glossed over as that aspect was unimportant to them ... while
>> being of critical importance to me and my customers at the time.
>>
>>> Our cluster deployment is a system that I've developed over the years
>>> and is pretty simple - if you know BASH and how pxe booting works.  It
>>> has everything from setting the correct parameters in the bios, zfs
>>> ram disks for the OS, lustre for state files (usually in /var) - all
>>> in the initrd.
>>>
>>> We use it to boot cluster nodes, lustre servers, misc servers and
>>> desktops.
>>>
>>> We basically treat everything like a cluster.
>> The most competent baked distro out there for this was (in the past,
>> haven't used it recently) Warewulf.  See https://github.com/warewulf/ .
>> Still under active development, and Greg and team do a generally great
>> job.  Least opinionated distro around, most flexible, and some of the
>> best tooling.
>>
>>> However... we do have a proliferation of images... and all need to be
>>> kept up-to-date and managed.  Most of the changes from one image to
>>> the next are config files.
>> Ahhh ... One of the things we did with our toolchain (it is open source,
>> I've just never pushed it to github) was to completely separate booting
>> from configuration.  That is, units booted to an operational state
>> before we applied configuration.  This was in part due to long
>> experience with nodes hanging during bootup with incorrect
>> configurations.  If you minimize the chance for this, your nodes
>> (barring physical device failure) always boot.  The only specific
>> opinion we had w.r.t. this system was that the nodes had to be bootable
>> via PXE, and there fore a working dhcp needed to exist on the network.
>>
>> Post boot configuration, we drove via a script that downloaded and
>> launched other scripts.   Since we PXE booted, network addresses were
>> fine.  We didn't even enforce final network address determination on PXE
>> startup.
>>
>> We looked at the booting process as a state machine.  Lower level was
>> raw hardware, no power.  Subsequent levels were bios POST, PXE of
>> kernel, configuration phase.  During configuration phase *everything*
>> was on the table w.r.t. changes.  We could (and did) alter networking,
>> using programmatic methods, databases, etc. to determine and configure
>> final network configs.  Same for disks, and other resources.
>>
>> Configuration changes could be pushed post boot by updating a script and
>> either pushing (not normally recommended for clusters of reasonable
>> size) or triggering a pull/run cycle for that script/dependencies.
>>
>> This allowed us to update images and configuration asynchronously.
>>
>> We had to manage images, but this turned out to be generally simple.  I
>> was in the midst of putting image mappings into a distributed object
>> store when the company died.  Config store is similarly simple, again
>> using the same mechanisms, and could be driven entirely
>> programmatically.
>>
>> Of course, for the chef/puppet/ansible/salt/cloudformation/... people,
>> we could drive their process as well.
>>
>>
>>> We don't have a good config management (which might, hopefully, reduce
>>> the number of images we need).  We tried puppet, but it seems everyone
>>> hates it.  Its too complicated?  Not the right tool?
>> Highly opinionated config management is IMO (and yes, I am aware this is
>> redundant humor) generally a bad idea.  Config management that gets out
>> of your way until you need it is the right approach. Which is why we
>> never tried to dictate what config management our users would use.  We
>> simply handled getting the system up to an operational state, and they
>> could use ours, theirs, or Frankensteinian kludges.
>>
>>> I was thinking of using git for config files, dumping a list of rpm's,
>>> dumping the active services from systemd and somehow munging all that
>>> together in the initrd.  ie. git checkout the server to get config
>>> files and systemctl enable/start the appropriate services etc.
>>>
>>> It started to get complicated.
>>>
>>> Any feedback/experiences appreciated.  What works well?  What doesn't?
>> IMO things that tie together config and booting are problematic at
>> scale.  Leads to nearly unmanageable piles of images, as you've
>> experienced.  Booting to an operational state, and applying all config
>> post boot (ask me about my fstab replacement some day), makes for a very
>> nice operational solution that scales wonderfully .... you can replicate
>> images to local image servers if you wish, replicate config servers,
>> load balance the whole thing to whatever scale you need.
>>
>>
>>> Thanks.
>>>
>>>
>>>
>>> --
>>> Dr Stuart Midgley
>>> sdm900 at gmail.com <mailto:sdm900 at gmail.com>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> MailScanner: Clean
>
>


-- 
Doug

-- 
MailScanner: Clean



More information about the Beowulf mailing list