[Beowulf] Most common cluster management software, job schedulers, etc?
prentice.bisbal at rutgers.edu
Wed Mar 9 11:18:00 PST 2016
On 03/08/2016 11:16 AM, Remy Dernat wrote:
> Le 08/03/2016 09:25, Carsten Aulbert a écrit :
>> On 03/08/2016 05:43 AM, Jeff Friedman wrote:
>>> Hello all. I am just entering the HPC Sales Engineering role, and would
>>> like to focus my learning on the most relevant stuff. I have searched
>>> near and far for a current survey of some sort listing the top used
>>> “stacks”, but cannot seem to find one that is free. I was breaking
>>> things down similar to this:
>> "relevant" stuff is pretty relative to what you want to achieve ;)
>>> _Provisioning software_: Cobbler, Warewulf, xCAT, Openstack,
>>> Platform HPC, ?
> Well, OpenStack is designed for cloud, not for HPC, but perhaps some
> people are using OpenStack in that purpose...
> You could add RocksCluster, sidus (
> ), kadeploy ( http://kadeploy3.gforge.inria.fr/ ), perceus (
> http://moo.nac.uci.edu/~hjm/Perceus-Report.html )...
>> In case of Debian: FAI
> You could also use FAI to serve non-debian-like systems. I use it to
> deploy ubuntu but you can also deploy redhat-like system, even if it
> is quite harder. Only the first boot system (through DHCP/PXE and
> then, NFS) is debian (nfsroot), then it can install what you need.
>>> _Configuration management_: Warewulf, Puppet, Chef, Ansible, ?
> + SaltStack ?
> Generally, people are not using that kind of stuff in HPC, but yes, it
> could happen.
Says you! ;)
I used to just do my cluster configuration using a postinstall script in
Kickstart (as you mention below), but once I started using Puppet for my
non-cluster systems, it made little sense to use two different
configuration management methodologies within the enterprise, so I
switched to just calling 'puppet agent' from the postinstall script.
The only difference is that to reduce overhead, I don't keep the puppet
agent daemons running on the compute nodes, I used gsh to run 'puppet
agent' on-demand. Nowadays, I'd use pdsh instead of gsh.
> You have some images to install the whole cluster, and eventually a
> post configuration step (kickstart ?), if you have a diskfull
> Then, you can use cluster-command tool like cluster-ssh, pssh, pdsh...
>> old school: cfengine, ...
>>> _Resource and job schedulers_: I think these are basically the same
>>> thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid
>>> Univa, Platform LSF, etc… others?
> + OAR https://oar.imag.fr/
> Some Job Schedulers are also Resource Manager, but it is not always
> true :
>> for high throughput computing: HTCondor
>>> _Performance monitoring_: Ganglia, Nagios, ?
>> Icinga, ...
> Shinken, zabbix, etc... There are also some new tools with others
> storage and display technology (influxDB , graphite, grafana...)...
> But for HPC, ganglia is good enough...
> PS : good to know this "small hpc" google site.
>>> Does anyone have any observations as to which of the above are the most
>>> common? Or is that too broad? I believe most the clusters I will be
>>> involved with will be in the 128 - 2000 core range, all on commodity
>> I guess everyone will have their preferences, if you wanted to get to
>> some hard, recent numbers, one way would be to crate an online
>> survey/form and ask many people to participate :)
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf