[Beowulf] What services do you run on your cluster nodes?
Robert G. Brown
rgb at phy.duke.edu
Tue Sep 23 03:17:27 PDT 2008
On Mon, 22 Sep 2008, Matt Lawrence wrote:
> On Mon, 22 Sep 2008, Bernard Li wrote:
>> Ganglia collects metrics from hosts and trends them for the user.
>> Most of these metrics need to be collected from the host itself (CPU,
>> memory, load, etc.).
>> Besides, the footprint of Ganglia is very little. I have yet heard of
>> a user complaining that Ganglia uses too much resources. Of course,
>> YMMV if you need every last CPU/memory for your job, then you should
>> turn everything off at the cost of managing a blackbox.
> Well, the folks I talked to at TACC were not enthusiastic about the amount of
> resources ganglia uses. I will agree that there is a lot of unecessary stuff
> that goes on, like converting everything to and from XML for each message.
XML is (IMO) good, not bad. Converting output to XMLish takes a truly
trivial amount of computation at any reasonable granularity for
monitoring (which should NOT be every second, probably not every five
seconds for a LARGE cluster). And one can get a lot of information into
a single TCP packet -- very few informational updates are likely to take
more than one packet. And one packet costs one TCP latency no matter
what -- it is the minimum cost of playing the game. It takes nearly as
long to send a small minimal length TCP packet as it does to send a full
MTU packet, because TCP latency is usually over half of the bottleneck
-- the wire speed is high enough that bw really isn't the obstacle.
If ganglia sends a whole long stream of TCP packets with multipacket
messages, well, that's just silly. In that case, as I said, consider
xmlsysd, which for most monitoring applications will return a single
packet with all the monitoring information needed (wrapped in XML, to be
sure). xmlsysd can monitor its OWN load impact on the system being
monitored, and it is trivial at sampling granularity on order of five
seconds, totally invisible at once every minute (which is ample for most
people and purposes).
> -- Matt
> It's not what I know that counts.
> It's what I can remember in time to use.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Robert G. Brown Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
More information about the Beowulf