[Beowulf] Most common cluster management software, job schedulers, etc?

Paul McIntosh paul.mcintosh at monash.edu
Tue Mar 8 13:40:17 PST 2016


FYI - Good info from SC13 Sys admin BOF - http://isaac.lsu.edu/sc13/ - be nice to have this updated on a yearly basis

-----Original Message-----
From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Douglas Eadline
Sent: Wednesday, 9 March 2016 7:39 AM
To: Jeff Friedman <jeff.friedman at siliconmechanics.com>
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Most common cluster management software, job schedulers, etc?


Jeff,

Many of the applications (within each group) do the basics. It really depends on the level of features, project/community activity, and support you need.
For instance some projects are great, but they have not been touched in years while others need a bit time investment to get working.

Also, if you are looking to get a high level overview of HPC (a little dated, no GPU coverage however) You can also have a look at the free AMD "HPC for Dummies"

  http://insidehpc.com/2012/09/free-download-hpc-for-dummies/


--
Doug


> Hello all. I am just entering the HPC Sales Engineering role, and 
> would like to focus my learning on the most relevant stuff. I have 
> searched near and far for a current survey of some sort listing the 
> top used “stacks”, but cannot seem to find one that is free. I was 
> breaking things down similar to this:
>
> OS disto:  CentOS, Debian, TOSS, etc?  I know some come trimmed down, 
> and also include specific HPC libraries, like CNL, CNK, INK?
>
> MPI options: MPICH2, MVAPICH2, Open MPI, Intel MPI, ?
>
> Provisioning software: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ?
>
> Configuration management: Warewulf, Puppet, Chef, Ansible, ?
>
> Resource and job schedulers: I think these are basically the same thing?
> Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine, 
> Univa, Platform LSF, etc… others?
>
> Shared filesystems: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ?
>
> Library management: Lmod, ?
>
> Performance monitoring: Ganglia, Nagios, ?
>
> Cluster management toolkits: I believe these perform many of the 
> functions above, all wrapped up in one tool?  Rocks, Oscar, Scyld, Bright, ?
>
>
> Does anyone have any observations as to which of the above are the 
> most common?  Or is that too broad?  I  believe most the clusters I 
> will be involved with will be in the 128 - 2000 core range, all on 
> commodity hardware.
>
> Thank you!
>
> - Jeff
>
>
>
>
>
> --
> Mailscanner: Clean
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
> Computing To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
>


--
Doug

--
Mailscanner: Clean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list