[Beowulf] job scheduler and health monitoring system
Chris Samuel
samuel at unimelb.edu.au
Fri Jan 10 16:49:55 PST 2014
On Fri, 10 Jan 2014 03:36:58 PM reza azimi wrote:
> hello guys,
G'day there. :-)
> I'm looking for a state of art job scheduler and health monitoring for my
> beowulf cluster and due to my research I've found many of them which made
> me confused. Can you help or recommend me the ones which are very hot and
> they are using in industry?
We used to use Torque (derived from OpenPBS many moons ago) on our x86
clusters but have recently migrated away from that to Slurm and are happy with
that - especially as we now have a common scheduler across x86 and BlueGene/Q
systems.
Both Torque and Slurm have callouts to health check scripts and we have our
own infrastructure for those based on work done both here and at $JOB-1. As
Adam noted there are also the Warewulf NHC scripts too which (I believe) are
not tied to using Warewulf and support both Slurm and Torque.
Don't think just of physical health checks either; we use our health checks
for other tasks such as Slurm upgrades too. We define the expected version and
so older nodes will mark themselves as DRAIN to allow running jobs to complete
so we can doing rolling upgrades. Same for kernel versions too.
The one gotcha is you do not want *any* health check to block when being
called by the Slurm or Torque process on the compute node, so we always run
our master check script from cron and have that process write a file to
/dev/shm. The script called by the queuing system daemon just has to parse
that and acts on what it finds.
You haven't mentioned deploying the nodes and remote power control, consoles,
etc - we use xCAT for that (all our nodes have IPMI adapters for remote
management).
> I have lm-sensors package on my servers and wanna a health monitoring
> program which record the temp as well, all I found are mainly record
> resource utilization.
That's starting to sound more like something like Ganglia, which you could use
in addition to actual health checks.
Also if you've got relatively recent Intel CPUs you can use the kernels
"coretemp" module to read certain temperatures that way too (YMMV).
If you've got IPMI adapters in the nodes for remote power control then they
can often return you useful sensor data - with the advantage that it's
completely out of band to the host so you can get information without
perturbing what is running on the compute node.
> Our workload are mainly MPI based benchmarks and we want to test some
> hadoop benchmarks in future.
According to a presentation I saw at the Slurm User Group back in September
Intel are working on Hadoop support in Slurm in such a way that you will not
need a modified Hadoop stack. Not sure when that code will land though.
Hope this is useful!
Happy new year all! :-)
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
More information about the Beowulf
mailing list