[Beowulf] cluster deployment and config management
Gavin W. Burris
bug at wharton.upenn.edu
Tue Sep 12 09:07:42 PDT 2017
We use a minimal pxe kickstart for hardware nodes, then Ansible after that. It is a thing of beauty to have the entire cluster defined in one git repo. This also lends itself to configuring cloud node images with the exact same code. Reusable roles and conditionals, FTW!
With regards to scaling, Ansible will by default fork only 8 parallel processes. This can be scaled way up, maybe hundreds at a time. If there are thousands of states / idempotent plays to run on a single host, those are going to take some time regardless of the configuration language, correct? A solution would be to tag up the plays and only run required tags during an update, versus a full run on fresh installs. The fact caching feature may help here. SSH accelerated mode or pipelining are newer feature, too, which will reduce the number of new connections required, a big time saver.
On Tue 09/05/17 02:57AM EDT, Carsten Aulbert wrote:
> On 09/05/17 08:43, Stu Midgley wrote:
> > Interesting. Ansible has come up a few times.
> > Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
> > so it needs to scale well :)
> We went with ansible at the end of 2015 until we hit a road block with
> it not using a client daemon a fat ferew months. When having a few 1000
> states to perform on each client, the lag for initiating the next state
> centrally from the server was quite noticeable - in the end a single run
> took more than half an hour without any changes (for a single host!).
> After that we re-evaluated with salt stack being the outcome scaling
> well enough for our O(2500) clients.
> Note, I ave not tracked if and how ansible progressed over the past
> ~2yrs which may or may not exhibit the same problems today.
> Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
> Callinstraße 38, 30167 Hannover, Germany
> Phone: +49 511 762 17185
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
More information about the Beowulf