[Beowulf] Re: recommendation on crash cart for a cluster room:fullcluster KVM is not an option I suppose?

Rahul Nabar rpnabar at gmail.com
Fri Oct 9 10:17:59 PDT 2009


On Thu, Oct 8, 2009 at 5:55 PM, Greg Lindahl <lindahl at pbm.com> wrote:

>
>
> 1) Console logging. Your machine just crashed. No clue in
> /var/log/messages. "I wonder if it printed something on the console?"
> Answer: ipmi and conman (available in an rpm in Red Hat distros).

I was "planning" on using kdump and a crash-kernel for that. Note the
emphasis on "planning". I never got that working correctly. I got
started on kdump+kexec when exactly the same "node crashes for unkown
reasons and I have no output" problem.

Maybe IPMI gives you the same functionality. Interesting point for me
though: What's the pros and cons of IPMI-console-logging versus kdump
in such crash scenarios. Are they competitors? Is one better / easier
than the other?

> 2) Monitoring. Temp, fan speeds, power supply state, events. Answers
> the "why is the little red light on the front of the case lit?"
> question. You can get some of this via other software (lm_sensors),
> but I find ipmitool to suck less, and ipmitool accurately answers the
> red light question -- lm_sensors can only guess.

I see. Yes, you read me correctly: I was putting full faith in
lm_sensors to do this. Currently I have lm_sensors feedign
Temperatures to my nagios monitoring setup and has been working fine.

But I didn't grasp a practical point about lm_sensors sucking more
than IPMI. THat's interesting again: Aren't they taking data from the
same bus or counters? Or is this because the sensor details tend to be
proprietary so lm_sensors lags behind the Vendor implementations of
IPMI?

Because if open-source IPMI is also trying to log sensor stats its in
competition with open source lm_sensors (not to say this is bad or un
heard of for multiple open source projects getting the same thing
done!)

-- 
Rahul



More information about the Beowulf mailing list