[Beowulf] Cluster stack based on Ansible. Looking for feedback.
oxedions at gmail.com
Thu Sep 12 14:00:00 PDT 2019
We are working on a new full Ansible based cluster stack, and because we
are reaching a stable state, we think it would be a good idea to present it
to the Beowulf community.
I tried to provide as much as possible relevant information in this mail.
*Origin of the need:*
We are building a small HPC center in a FabLab, at Auray – France (
https://en.wikipedia.org/wiki/Auray). We needed something simple to manage
our cluster and our workstations, but also very flexible as we are working
with enterprises and universities (some wants Centos, others Ubuntu, etc).
We also wanted something with less scripts as possible, to easily maintain
the product and spend less time managing the cluster to have spare time
doing interesting things.
After a deep search, Ansible was chosen for its simplicity, and we iterated
over and over to converge to this stack.
Stack is fully open, in MIT. We do not sell anything, just free best effort
*What it can do:*
In its current state, stack can deploy on Centos 7.6 and RHEL 8.0 (and
nearly on Ubuntu 18.04):
- An /etc/hosts file
- A dhcp configuration with optional shared networks
- A DNS configuration based on Bind (server/client)
- A time configuration based on Chrony (server/client)
- A full PXE stack, based on simplicity and verbosity
o Tftp based on atftp
o Apache for all repos/files
o iPXE advanced stack with menu to handle all exotic hardware
§ EFI / Legacy
§ PXE / iPXE native ROM
§ CD or USB boot to PXE when no native PXE available (or stupidly made
- Repositories server/client
- NIC configuration (basic for now, based on NMCLI Ansible module)
- Rsyslog (server/client, systemd split files)
- NFS (server/client)
And as addons:
- Slurm (basic configuration, master/nodes)
- Clustershell groups
- Basic OpenLdap with phpldapadmin (currently unsecure)
- Very basic user’s management for very small clusters *with a
single login node* to replace Ldap
- Prometheus configuration (Prometheus, Alertmanager,
NodeExporter, with basic configuration and already few alerts)
Also, stack can deploy Ubuntu 18.04 and OpenSuse Leap 15.1 via PXE, but
still not all Ansible roles can be deploy on it after OS deployment (we are
implementing Ubuntu right now for our workstations).
*The stack is fully modular*. Any new role can be created, and any new data
can be added to the inventory. We worked hard so that roles are fully
independent (you can replace DHCP/DNS/PXE by Cobbler, others roles do not
care. You can replace Slurm by another JobScheduler, same. Etc).
Note to Ansible users: we are using “merge” hash_behaviour. Using this,
stack can cover simple but also very complex clusters (parameters can be
targeted to cover specific host(s) configuration or simply experiment).
The stack also as few native mechanisms under the hood ready and tested as
POC, that we are implementing. These features are a need for us, so we
- Multi Icebergs (sometime called multi islands in HPC)
o When there is a need to separated parts of the cluster (our case, to
provide dedicated small sub-clusters to some enterprises), stack is able to
split hosts into icebergs, each iceberg being managed by a group of
management nodes and isolated from the others (but can be reached through
interconnect if exist and asked).
o Because Ansible rely on groups and ssh, we can simply achieve that.
o This feature would allow the stack to handle an old cluster aside a new
one keeping both in production, or maybe scale to very large configurations.
- Accelerated modes
o Ansible is famous for its simplicity, but also for being slow, even
o Accelerated mode is a POC we designed to heavily accelerate critical
templates rendering. It is based on some analysis we made on Ansible. It
also reduces memory usage.
o Accelerated mode is just a basic trick we found in the inventory, no
Note that for us, it is mandatory that the stack stay simple. These
features are by default not activated, keeping Ansible inventory very
simple for basic usage.
Last point: stack do not aim HPC in particular, it is generic and can adapt
to HPC clusters, but also enterprise or university IT networks
(workstations, laptops, etc.).
*What we are doing with it:*
We are managing our 14 thin (32GbRam/16cores) + 1 fat (1TbRam/64cores)
supermicro servers, and also few workstations.
We are a FabLab made of science and technology fans, so not a lot of money,
which means we are gathering “old” equipment (we are Sandy Bridge right
now) for our cluster. This is why we needed a so flexible stack: to be able
to handle exotic hardware.
*The future of the stack:*
Main future objective would be to create a base for a *modular* and *simple*
stack. For simple clusters or just testing/dev on part of large clusters.
This stack has (we think) a nice PXE mechanism, and all the basis needed.
It could be used as a base for other things. Like a skeleton missing flesh.
Anyone could add new roles/modules/tools to it, keeping in mind simplicity
and roles independency.
We are right now working on Ubuntu 18.04 implementation, and we hope to
release version 1.0 soon.
*Where to find:*
You can find the stack on github here:
Documentation is in resources/documentation/_build/html (the _build/html
directory will soon be removed from the repository and hosted somewhere, as
only documentation sources should be here).
The few packages are still not online, but we can provide them if someone
which to test the stack.
Stack is young, so we are looking for any feedbacks (positive and/or
negative). Feel free to have a look. For any detail, please do not hesitate
to contact us :-)
Thank you for reading this very long and boring mail.
With our best regards
Oxedions and Johnny Keats
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf