[Beowulf] What services do you run on your cluster nodes?

Ashley Pittman apittman at concurrent-thinking.com
Tue Sep 23 05:03:31 PDT 2008


On Mon, 2008-09-22 at 15:52 -0700, Bernard Li wrote:
> Or you can re-post with a different topic here, and I'll try my best
> to answer them :-)

We collect a half dozen or so simple metrics per node and add them to
Ganglia using "gmetric".  A naive implementation of this with 32 nodes
each reporting approx eight metrics every thirty seconds achieved the
desired result however cost of over 50% of our application performance.
We have since changed the frequency from 30 seconds to five minutes and
synchronised the daemons so they all run concurrently which has got us
back most of the performance we lost.  There is a specific requirement
for this data to be monitored of I'd just turn the whole thing off.

One problem with this that I'm still not happy with is we need to run a
separate gmetric for each metric we collect, is there a way to pass
gmetric a list of metrics so that it only has to startup and send data
once rather than N times?

Ashley.




More information about the Beowulf mailing list