[Beowulf] Re: recommendation on crash cart for a cluster room:fullcluster KVM is not an option I suppose?

John Hearns hearnsj at googlemail.com
Thu Oct 8 22:54:44 PDT 2009

2009/10/8 Greg Lindahl <lindahl at pbm.com>:

> You haven't mentioned the other things you can use IPMI for.
> 1) Console logging. Your machine just crashed. No clue in
> /var/log/messages. "I wonder if it printed something on the console?"
> Answer: ipmi and conman (available in an rpm in Red Hat distros).
> 2) Monitoring. Temp, fan speeds, power supply state, events. Answers
> the "why is the little red light on the front of the case lit?"
> question.

At the risk of getting a reputation in these parts, both come as
standard on the SGI ICE cluster. Console logging via IPMI/conman on
the rack leader for all nodes, which is then mounted across to the
admin node.
Temp, fan speed etc. logged on the rack leaders and reported via ESP
monitoring. Ganglia implemented too.

