[Beowulf] using SNMP to monitor disk usage and load factors on compute-nodes

Rahul Nabar rpnabar at gmail.com
Wed Dec 24 09:26:58 PST 2008

I was toying with the idea of monitoring some key stats from my
compute-nodes using SNMP (eg. load factors; local disk usage; health
of my pbs_moms etc.). Especially since Nagios docs. seem to recommend
snmp as a recommended way to do the monitoring of private resources
(as opposed to ssh or nrpe plugins).

I've never been familiar with SNMP before (leave that my Dell switches
have an option to export stats via SNMP that I never used!) What do
the wise-Beowulf-sysadmins have to say? Any caeveats?

I checked with "etc/init.d/snmpd status" which reports
/etc/init.d/snmpd: Command not found."

So I guess I first need to install "net-snmp". My compute-nodes are
already behind a firewall so I guess security should not be an issue
by running this additional service on my compute-nodes. Perhaps
performance takes a tiny hit; but I doubt it!

Does SNMP make for a sound monitoring-philosophy and are others using
t on their clusters?


