[Beowulf] Re: recommendation on crash cart for a cluster room:fullcluster KVM is not an option I suppose?

Mark Hahn hahn at mcmaster.ca
Fri Oct 9 14:27:09 PDT 2009


> in such crash scenarios. Are they competitors? Is one better / easier
> than the other?

they're different.  and I've never actually seen kdump in use.
but logging (remote syslog, syslog-ng, netconsole, ipmi-sol, etc)
is something everyone does to varying degrees and more is better.

>> 2) Monitoring. Temp, fan speeds, power supply state, events. Answers
>> the "why is the little red light on the front of the case lit?"
>> question. You can get some of this via other software (lm_sensors),
>> but I find ipmitool to suck less, and ipmitool accurately answers the
>> red light question -- lm_sensors can only guess.
>
> I see. Yes, you read me correctly: I was putting full faith in
> lm_sensors to do this. Currently I have lm_sensors feedign
> Temperatures to my nagios monitoring setup and has been working fine.

lm_sensors is in-band, in that it consumes cycles on your node,
and doesn't help you if your node isn't working right.  IPMI is OOB,
has no performance effect and works regardless of power state, panic, etc.

> But I didn't grasp a practical point about lm_sensors sucking more
> than IPMI. THat's interesting again: Aren't they taking data from the
> same bus or counters? Or is this because the sensor details tend to be
> proprietary so lm_sensors lags behind the Vendor implementations of
> IPMI?

lm_sensors is _more_ flexible because it can, for instance, probe components
that the BMC doesn't know about.  I'm thinking of SPD on dimms, which is 
gettable over the I2C bus, but I've never seen an IPMI mess with it.
there are other I2C devices too (some video cards?).

the more your monitoring can be OOB, the better.  not just from a
perturbation standpoint, but also fragility.



More information about the Beowulf mailing list