Cluster Monitoring software?

Ken McDonell kenmcd at melbourne.sgi.com
Wed Oct 25 15:44:03 PDT 2000


On Wed, 25 Oct 2000, Patrick Lesher wrote:

> 
> SGI has a really nice package called Performance Co Pilot.
> It monitors all kinds of different things and they are are always adding
> more on.  
> 
> You can download it from their oss web site ( 
> http://oss.sgi.com/projects/pcp/ )
> 
> If I remember correctly, this version isn't as complete as what you can
> purchase from them or comes in their ACE package, I can't remember what
> the differences are.

The things that PCP brings to the table for monitoring cluster performance
are:

    - centralized monitoring and management of distributed processing
      (PCP uses very efficient TCP/IP protocols to move the data about)

    - a unified API to access _all_ performance data (from the h/w,
      the o/s, the service layers and the applications) ... this
      includes all of the metrics Joseph was asking about, and lots
      more ... the same API works for different operating systems,
      so monitoring tools are insulated from the details of dredging
      interesting numbers from dark corners of each o/s

    - the available performance data can be easily extended via a
      a plugin architecture

    - real-time and historical data sources are unified under the
      same API

    - an inference engine for detecting common performance scenrios
      and raising arbitrary alarms (can be used with both real-time and
      historical data sources)

There is an open source stripchart monitoring tool developed by Michal
Kara (details from the News page off the oss.sgi.com projects page).

Other SGI-developed monitoring tools (including 3-D visualization of
performance) are not open sourced, but are are sold as part of the SGI
Linux solutions (ACE is one example)

We are always keen to communicate with people who'd be interested in
adapting or expanding PCP into new performance monitoring scenarios.





More information about the Beowulf mailing list