[Beowulf] Most common cluster management software, job schedulers, etc?

Jeff Friedman jeff.friedman at siliconmechanics.com
Mon Mar 7 20:43:59 PST 2016

Hello all. I am just entering the HPC Sales Engineering role, and would like to focus my learning on the most relevant stuff. I have searched near and far for a current survey of some sort listing the top used “stacks”, but cannot seem to find one that is free. I was breaking things down similar to this:

OS disto:  CentOS, Debian, TOSS, etc?  I know some come trimmed down, and also include specific HPC libraries, like CNL, CNK, INK?  

MPI options: MPICH2, MVAPICH2, Open MPI, Intel MPI, ? 

Provisioning software: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ?

Configuration management: Warewulf, Puppet, Chef, Ansible, ? 

Resource and job schedulers: I think these are basically the same thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine, Univa, Platform LSF, etc… others?

Shared filesystems: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ? 

Library management: Lmod, ? 

Performance monitoring: Ganglia, Nagios, ?

Cluster management toolkits: I believe these perform many of the functions above, all wrapped up in one tool?  Rocks, Oscar, Scyld, Bright, ?

Does anyone have any observations as to which of the above are the most common?  Or is that too broad?  I  believe most the clusters I will be involved with will be in the 128 - 2000 core range, all on commodity hardware. 

Thank you!

- Jeff

