[Beowulf] RRDtools graphs of temp from IPMI
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Craig West cwest at astro.umass.eduSat Nov 8 21:45:18 PST 2008
- Previous message: [Beowulf] RRDtools graphs of temp from IPMI
- Next message: [Beowulf] RRDtools graphs of temp from IPMI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Gerry, Like others, I too use ganglia - and have a custom script which reports cpu temps (and fan speeds) for the nodes. However, I changed the default method of communication for ganglia (multicast) to reduce the chatter. I use a unicast setup, where each node reports directly to the monitoring server - which is a dedicated machine for monitoring all the systems - and performing other tasks (dhcp, ntp, imaging, etc) Each node is using less than 1KB/sec to transmit all the ganglia information, including my extra metrics. For the useful recording information you get from this data its worth the rather small network chatter. You can tune the metrics further, turn off the ones you don't want, or have them report less often. I'd suggest installing it, if you still think it is chatty, then remove it and look for another option. I find it useful in that you can see when a node died, what the load on the node was when it crashed, what the network traffic is, etc... I also use cacti - but only for the head servers, switches, etc. I find it has too much over head for the nodes. It is however useful in that it can send emails to alert you to problems, and allows for graphing of SNMP devices. Craig. Gerry Creager wrote: > Now, for the flame-bait. Bernard suggests cacti and/or ganglia to > handle this. Our group have heard some mutterings that ganglia is a > "chatty" applicaiton and could cause some potential hits on or 1 Gbe > interconnect fabric. > > A little background on our current implementation: 126 dual-quad core > Xeon Dell 1950's interconnected with gigabit ethernet. No, it's not > the world's best MPI machine, but it should... and does... perform > admirably for throughput applications where most jobs can be run on a > node (or two) but which don't use MPI as much as, e.g., OpenMP, or in > some cases, even run on a single core but use all the RAM. > > So, we're worried a bit about having everything talk on the same > gigabit backplane, hence, so far, no ganglia. > > What are the issues I might want to worry about in this regard, > especially as we expand this cluster to more nodes (potentially going > to 2k cores, or, essentially doubling?
- Previous message: [Beowulf] RRDtools graphs of temp from IPMI
- Next message: [Beowulf] RRDtools graphs of temp from IPMI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
