[Beowulf] Most common cluster management software, job schedulers, etc?
Jeff Friedman
jeff.friedman at siliconmechanics.com
Wed Mar 9 16:34:06 PST 2016
Thanks everyone! Your replies were very helpful.
>
>
>> On Mar 8, 2016, at 2:49 PM, Christopher Samuel <samuel at unimelb.edu.au> wrote:
>>
>> On 08/03/16 15:43, Jeff Friedman wrote:
>>
>>> Hello all. I am just entering the HPC Sales Engineering role, and would
>>> like to focus my learning on the most relevant stuff. I have searched
>>> near and far for a current survey of some sort listing the top used
>>> “stacks”, but cannot seem to find one that is free. I was breaking
>>> things down similar to this:
>>
>> All the following is just for us, but in your role you'll probably need
>> to be familiar with most options I would have thought based on customer
>> requirements. Specialisation for your preferred suite is down to you of
>> course!
>>
>>> _OS disto_: CentOS, Debian, TOSS, etc? I know some come trimmed down,
>>> and also include specific HPC libraries, like CNL, CNK, INK?
>>
>> RHEL - hardware support attitude of "we support both types of Linux,
>> RHEL and SLES".
>>
>>> _MPI options_: MPICH2, MVAPICH2, Open MPI, Intel MPI, ?
>>
>> Open-MPI
>>
>>> _Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ?
>>
>> xCAT
>>
>>> _Configuration management_: Warewulf, Puppet, Chef, Ansible, ?
>>
>> xCAT
>>
>> We use Puppet on for infrastructure VMs (running Debian).
>>
>>> _Resource and job schedulers_: I think these are basically the same
>>> thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine,
>>> Univa, Platform LSF, etc… others?
>>
>> Yes and no - we run Slurm and use its own scheduling mechanisms but you
>> could plug in Moab should you wish.
>>
>> Torque has an example pbs_sched but that's just a FIFO, you'd want to
>> look at Maui or Moab for more sophisticated scheduling.
>>
>>> _Shared filesystems_: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ?
>>
>> GPFS here - copes well with lots of small files (looks at one OpenFOAM
>> project that has over 19 million files & directories - mostly
>> directories - and sighs).
>>
>>> _Library management_: Lmod, ?
>>
>> I've been using environment modules for almost a decade now but our
>> recent cluster has switched to Lmod.
>>
>>> _Performance monitoring_: Ganglia, Nagios, ?
>>
>> We use Icinga for monitoring infrastructure, including polling xCAT and
>> Slurm for node information such as error LEDs, down nodes, etc.
>>
>> We have pnp4nagios integrated with our Icinga to record time series
>> information about memory usage, etc.
>>
>>> _Cluster management toolkits_: I believe these perform many of the
>>> functions above, all wrapped up in one tool? Rocks, Oscar, Scyld, Bright, ?
>>
>> N/A here.
>>
>> All the best!
>> Chris
>> --
>> Christopher Samuel Senior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/ http://twitter.com/vlsci
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20160309/38e91d9f/attachment.html>
More information about the Beowulf
mailing list