[Beowulf] job scheduler and health monitoring system

Fri Jan 10 16:49:55 PST 2014

On Fri, 10 Jan 2014 03:36:58 PM reza azimi wrote:

> hello guys,

G'day there. :-)

> I'm looking for a state of art job scheduler and health monitoring for my
> beowulf cluster and due to my research I've found many of them which made
> me confused. Can you help or recommend me the ones which are very hot and
> they are using in industry?

We used to use Torque (derived from OpenPBS many moons ago) on our x86 
clusters but have recently migrated away from that to Slurm and are happy with 
that - especially as we now have a common scheduler across x86 and BlueGene/Q 
systems.

Both Torque and Slurm have callouts to health check scripts and we have our 
own infrastructure for those based on work done both here and at $JOB-1.   As 
Adam noted there are also the Warewulf NHC scripts too which (I believe) are 
not tied to using Warewulf and support both Slurm and Torque.

Don't think just of physical health checks either; we use our health checks 
for other tasks such as Slurm upgrades too. We define the expected version and 
so older nodes will mark themselves as DRAIN to allow running jobs to complete 
so we can doing rolling upgrades.   Same for kernel versions too.

The one gotcha is you do not want *any* health check to block when being 
called by the Slurm or Torque process on the compute node, so we always run 
our master check script from cron and have that process write a file to 
/dev/shm. The script called by the queuing system daemon just has to parse 
that and acts on what it finds.

You haven't mentioned deploying the nodes and remote power control, consoles, 
etc - we use xCAT for that (all our nodes have IPMI adapters for remote 
management).

> I have lm-sensors package on my servers and wanna a health monitoring
> program which record the temp as well, all I found are mainly record
> resource utilization.

That's starting to sound more like something like Ganglia, which you could use 
in addition to actual health checks.

Also if you've got relatively recent Intel CPUs you can use the kernels 
"coretemp" module to read certain temperatures that way too (YMMV).

If you've got IPMI adapters in the nodes for remote power control then they 
can often return you useful sensor data - with the advantage that it's 
completely out of band to the host so you can get information without 
perturbing what is running on the compute node.

> Our workload are mainly MPI based benchmarks and we want to test some
> hadoop benchmarks in future.

According to a presentation I saw at the Slurm User Group back in September 
Intel are working on Hadoop support in Slurm in such a way that you will not 
need a modified Hadoop stack. Not sure when that code will land though.

Hope this is useful!

Happy new year all! :-)

Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci