[Beowulf] cluster deployment and config management

Rémy Dernat remy.dernat at umontpellier.fr
Tue Sep 5 04:52:50 PDT 2017


Le 05/09/2017 à 08:57, Carsten Aulbert a écrit :
> Hi
> On 09/05/17 08:43, Stu Midgley wrote:
>> Interesting.  Ansible has come up a few times.
>> Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
>> so it needs to scale well :)
> We went with ansible at the end of 2015 until we hit a road block with
> it not using a client daemon a fat ferew months. When having a few 1000
> states to perform on each client, the lag for initiating the next state
> centrally from the server was quite noticeable - in the end a single run
> took more than half an hour without any changes (for a single host!).
> After that we re-evaluated with salt stack being the outcome scaling
> well enough for our O(2500) clients.

+1 for SaltStack here. It really performs very well on large 
infrastructure (from doc. 
https://docs.saltstack.com/en/latest/topics/tutorials/intro_scale.html ) 
and allows complex rules with reactors and orchestrators (including some 
ways to manage post-reboot/connections).

There is also a github project which allows to deploy a cluster from 
scratch with SaltStack, on a CentOS base, with PXE, dhcp, dns, 
kickstart... :

Personnally, I will use (it works, but it needs some additionnal tests) 
SaltStack with FAI ( https://fai-project.org/ ) to deploy my nodes. Or 
maybe, I will switch to banquise, but for now, this project is still a 
bit too young and I need a debian base OS (but I know it is planned; 
waiting for the preseed config management through Salt). I am using 
gitfs as a SaltStack backend and I also have some configs files as 
another git repository (eg : environment modules files).

Best regards,

> Note, I ave not tracked if and how ansible progressed over the past
> ~2yrs which may or may not exhibit the same problems today.
> Cheers
> Carsten

More information about the Beowulf mailing list