[Beowulf] How do people keep track of computers in your cluster(s)?

Joe Landman landman at scalableinformatics.com
Mon Oct 22 05:31:53 PDT 2007


Hi Carsten:


Leif Nixon wrote:
> Carsten Aulbert <carsten.aulbert at aei.mpg.de> writes:
> 
>> I know this sounds just like three medium sized SQL tables, but at least
>> I wanted to ask what people are using if more than a single person is
>> working on the cluster. One person can probably do this with a simple
>> text file and a set of papers in a filing cabinet.
> 
> Seems most answers have focused on how to collect the data, not how to
> store and present it.

   I have been pretty heads down over the last few days.  We 
gather/store/generate most of this information (apart from serial number 
which is in general, quite hard for random hardware).  We store it in a 
SQLite database, and use a combination of web clients and command line 
clients to interact with it, presenting the output to our users.

   For IPMI and related per node elements, upon post-installation, our 
units run something called a finishing script, which queries the BMC 
hardware, gathering the IPMI mac address (if available), and using a 
remote web client, storing that back into the database if requested.  We 
configure, actually derive the IPMI address from the network address 
using a simple algorithm, which the finishing script handles.  It 
configures the BMC for us upon completion, and commits the 
configuration.  This has worked on a fairly wide range of hardware thus 
far, though some BMCs behave quite differently from others.

   This capability is generally part of our cluster loads.  Distro and 
for the most part, hardware independent; as long as we can talk 
to/control/set the BMC from a script, we can make this part of the stand 
up process.

> Some people I know are using AT (Asset Tracker) for this purpose. That is
> unfortunately a very generic and overloaded name, but *this* instance
> of Asset Tracker is an addon for Request Tracker (one of the leading
> trouble ticketing systems). It can be found here:
> 
> http://sourceforge.net/projects/rtat/
> 
> I'm not sure how ready for public consumption AT is, though.

AT is built for a more general problem.  You can certainly use it, or 
any other system you wish.  What we have is tied into our cluster 
setup/load system, so it is more cluster specific than the others.

Joe

> 


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list