[Beowulf] RRDtools graphs of temp from IPMI
Craig West
cwest at astro.umass.edu
Sat Nov 8 21:45:18 PST 2008
Gerry,
Like others, I too use ganglia - and have a custom script which reports
cpu temps (and fan speeds) for the nodes. However, I changed the default
method of communication for ganglia (multicast) to reduce the chatter. I
use a unicast setup, where each node reports directly to the monitoring
server - which is a dedicated machine for monitoring all the systems -
and performing other tasks (dhcp, ntp, imaging, etc)
Each node is using less than 1KB/sec to transmit all the ganglia
information, including my extra metrics. For the useful recording
information you get from this data its worth the rather small network
chatter. You can tune the metrics further, turn off the ones you don't
want, or have them report less often.
I'd suggest installing it, if you still think it is chatty, then remove
it and look for another option. I find it useful in that you can see
when a node died, what the load on the node was when it crashed, what
the network traffic is, etc...
I also use cacti - but only for the head servers, switches, etc. I find
it has too much over head for the nodes. It is however useful in that it
can send emails to alert you to problems, and allows for graphing of
SNMP devices.
Craig.
Gerry Creager wrote:
> Now, for the flame-bait. Bernard suggests cacti and/or ganglia to
> handle this. Our group have heard some mutterings that ganglia is a
> "chatty" applicaiton and could cause some potential hits on or 1 Gbe
> interconnect fabric.
>
> A little background on our current implementation: 126 dual-quad core
> Xeon Dell 1950's interconnected with gigabit ethernet. No, it's not
> the world's best MPI machine, but it should... and does... perform
> admirably for throughput applications where most jobs can be run on a
> node (or two) but which don't use MPI as much as, e.g., OpenMP, or in
> some cases, even run on a single core but use all the RAM.
>
> So, we're worried a bit about having everything talk on the same
> gigabit backplane, hence, so far, no ganglia.
>
> What are the issues I might want to worry about in this regard,
> especially as we expand this cluster to more nodes (potentially going
> to 2k cores, or, essentially doubling?
More information about the Beowulf
mailing list