Beowulf and Big Brother

Martin Siegert siegert at sfu.ca
Tue Nov 12 12:42:56 PST 2002


On Mon, Nov 11, 2002 at 09:55:56PM +0200, Ollisl wrote:
> I just realized that this question is better to ask from Big Brother
> people. But maybe you have some comments about it too. So:
> 
> We have 2000 computing nodes and 96 monitoring computers. There is a
> possibility that we have 96 different beowulf clusters there each
> having about 20 PC's but you never know(No decisions yet in that
> matter) ;) I was just wondering if it is reasonable or smart to monitor
> these master nodes with Big Brother? Or is there even ready-made shell-
> scripts for that?
> 
> I was thinking of something like this: A script runs every once and a
> while gathering data of the status of each slave-node, on each master
> node. Then that data is sent to Big Brother-server, whenever it is
> asked. So every master would be running a BB client.
> 
> Is there any sense doing things like that?

I am using big brother in exactly that way on our cluster (96 dual AMD).
On the master I do a tcp ping (cf. "anna") of all slaves to check for
connectivity. I also modified the rwhod daemon so that it includes
information about cpu temperatures and fan speeds (and the slaves sent to
the master only, instead of using broadcasts).
Thus all relevant information about the slaves is available on the
master. Only the master runs a bb client.

Works well for me.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================



More information about the Beowulf mailing list