[Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

Alex Younts ayounts at tinkergeek.com
Tue Dec 23 11:05:07 PST 2008


We have quite a few different PBS servers running PBSPro 9.x. Our
Nagios box has a bare install of the PBSPro and we wrote a check
script that runs "pbsnodes -s $cluster-head-node $nodehostname" and
checks to see if PBS thinks the node is happy. (We determine which PBS
server to hit up based on the host name of the node.)

Alex Younts

On Tue, Dec 23, 2008 at 1:24 PM, Rahul Nabar <rpnabar at gmail.com> wrote:
> On Mon, Dec 22, 2008 at 10:23 PM, Alex Younts <ayounts at tinkergeek.com> wrote:
>> At my employer, we use a variety of monitoring tools for our various
>> clusters. Our nagios box is a VM with a single processor and 512MB of
>> memory. Currently, we monitor 1700 hosts, each with three or four
>> service checks a piece (two of which SSH to nodes to run scripts). We
>> check services about every 30 minutes.
>
> Thanks Alex! I will give that a shot now! Are there any torque / pbs /
> maui monitoring Nagios scripts out there? I wanted to avoid
> reinventing the wheel if at all possible!
>
> --
> Rahul
>



More information about the Beowulf mailing list