[Beowulf] Configuration Management and Monitoring of a Debian Etch Beowulf Cluster

Farid Behnia behnia at gmail.com
Thu Aug 30 07:50:28 PDT 2007


Hi,

I've managed to put together a simple 2-node cluster using Debian etch ,
OpenMPI , FAI & Cfengine.

I'm looking for ideas that can help me with building a better self-healing
cluster. Right now I'm making rule files for cfengine and would acknowledge
any input on sample files and important configurations that need to be made
for the cluster's health. (Although it's site-specific but I'm sure I can
get good hints out of them)

However I'd also be glad to see if you have any monitoring system in mind
that can cooperate with cfengine in the maintenance job. I've looked briefly
into Ganglia and Nagios so far. It seems Ganglia is mostly meant for large
(groups of) clusters and focuses on hw resources. Nagios seems to be
better-suited for my job, but the gurus at cfengine mailing list believe
that cfenvd & cfexecd can provide equal monitoring & recovery capability (in
terms of response time).
What's your take on either of them?

Thanks beforehand for any input.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070830/1baeb379/attachment.html>


More information about the Beowulf mailing list