[Beowulf] How do people keep track of computers in your cluster(s)?

Mark Hahn hahn at mcmaster.ca
Mon Oct 22 09:15:08 PDT 2007


> computers which we are going to buy. On the one hand there are the
> standard tools to monitor a running cluster like ganglia, nagios,
> zenoss, ... but these are - to my understanding - just for monitoring
> the current status.

one problem with this is that it's naturally integrated with many
other things.  for instance, the scheduler may want to have information on
the layout of nodes (to minimize the number of switches spanned by a 
parallel job, for instance).  similarly, I'd very much like to have my
syslog/eventlog correlatable through hardware to the actual jobs running
at the time.

in a sense, ideally all information would be integrated.  but that would
require _everything_ to be standardized (your scheduler must be able to 
read the same DB/tables as your event management system, etc).  outside 
of Redmond, I'm not sure how practical that is.



More information about the Beowulf mailing list