[Beowulf] Most common cluster management software, job schedulers, etc?

Tue Mar 8 08:16:24 PST 2016

Hi,

Le 08/03/2016 09:25, Carsten Aulbert a écrit :
> Hi
>
> On 03/08/2016 05:43 AM, Jeff Friedman wrote:
>> Hello all. I am just entering the HPC Sales Engineering role, and would
>> like to focus my learning on the most relevant stuff. I have searched
>> near and far for a current survey of some sort listing the top used
>> “stacks”, but cannot seem to find one that is free. I was breaking
>> things down similar to this:
> "relevant" stuff is pretty relative to what you want to achieve ;)
>> _Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ?
Well, OpenStack is designed for cloud, not for HPC, but perhaps some 
people are using OpenStack in that purpose...

You could add RocksCluster, sidus ( 
http://www.cbp.ens-lyon.fr/doku.php?id=en:developpement:productions:sidus ), 
kadeploy ( http://kadeploy3.gforge.inria.fr/ ), perceus ( 
http://moo.nac.uci.edu/~hjm/Perceus-Report.html )...
>>
> In case of Debian: FAI

You could also use FAI to serve non-debian-like systems. I use it to 
deploy ubuntu but you can also deploy redhat-like system, even if it is 
quite harder. Only the first boot system (through DHCP/PXE and then, 
NFS) is debian (nfsroot), then it can install what you need.
>
>> _Configuration management_: Warewulf, Puppet, Chef, Ansible, ?
>>
+ SaltStack ?

Generally, people are not using that kind of stuff in HPC, but yes, it 
could happen.

You have some images to install the whole cluster, and eventually a post 
configuration step (kickstart ?), if you have a diskfull configuration.

Then, you can use cluster-command tool like cluster-ssh, pssh, pdsh...
> old school: cfengine, ...
>
>> _Resource and job schedulers_: I think these are basically the same
>> thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine,
>> Univa, Platform LSF, etc… others?
+ OAR https://oar.imag.fr/

Some Job Schedulers are also Resource Manager, but it is not always true 
: 
https://wiki.hpcc.msu.edu/display/hpccdocs/Resource+Managment+and+Job+Scheduler

> for high throughput computing: HTCondor
>
>> _Performance monitoring_: Ganglia, Nagios, ?
> Icinga, ...
Shinken, zabbix, etc... There are also some new tools with others 
storage and display technology (influxDB , graphite, grafana...)...

But for HPC, ganglia is good enough...

Best,
Remy.

PS : good to know this "small hpc" google site.
>
>> Does anyone have any observations as to which of the above are the most
>> common?  Or is that too broad?  I  believe most the clusters I will be
>> involved with will be in the 128 - 2000 core range, all on commodity
>> hardware.
> I guess everyone will have their preferences, if you wanted to get to
> some hard, recent numbers, one way would be to crate an online
> survey/form and ask many people to participate :)
>
> Cheers
>
> Carsten
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Rémy Dernat
Ingénieur d'Etudes
MBB/ISE-M