[Beowulf] job scheduler and health monitoring system

Fri Jan 10 13:22:46 PST 2014

Hi Reza,

The "common stack" seems to vary depending on what industry you're looking
at. For example, Grid Engine seems to be a really popular job scheduler in
bioinformatics, even though I get the impression that it's on the way out
in a lot of other industries.

I think most cluster management tools are fairly mature right now. Some are
more actively developed than others, but I don't think "what's hot" is
necessarily a good way to choose your tools.

More important is whether someone on your team is familiar with those
tools, or with the languages they're written in; or whether you can get
support easily if you don't have expertise yourself.

For what it's worth, my current "favorites" for scheduling and monitoring
include:

* Job scheduler: SLURM
* Light-weight health checks between jobs: Warewulf NHC
* Detailed performance monitoring: Ganglia

Neither NHC or Ganglia do temperature monitoring out-of-the-box (last I
checked), but they're both really easy to extend with something as easy as
bash scripts.

Adam

On Fri, Jan 10, 2014 at 12:36 PM, reza azimi <reza.c.azimi at gmail.com> wrote:

> hello guys,
>
> I'm looking for a state of art job scheduler and health monitoring for my
> beowulf cluster and due to my research I've found many of them which made
> me confused. Can you help or recommend me the ones which are very hot and
> they are using in industry?
> I have lm-sensors package on my servers and wanna a health monitoring
> program which record the temp as well, all I found are mainly record
> resource utilization.
> Our workload are mainly MPI based benchmarks and we want to test some
> hadoop benchmarks in future.
>
>
> Regards
> Reza
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140110/c99cc113/attachment.html>