[Beowulf] recommendation on crash cart for a cluster room: full cluster KVM is not an option I suppose?

Joe Landman landman at scalableinformatics.com
Sat Oct 3 10:13:58 PDT 2009

Rahul Nabar wrote:
> On Sat, Oct 3, 2009 at 11:54 AM, Joe Landman
> <landman at scalableinformatics.com> wrote:
>> If I were building a cluster of anything more than 4 machines (not racks,
>> machines), I would be insisting upon IPMI 2.0 with a working SOL and kvm
>> over IP capability built in.
> Thanks for those tips Joe. I am already convinced by all the posts on
> the list that IPMI is a must. No other way. All you guys seem pretty
> unanimous about that much!
>> For the 250-300 machine system you are looking at, you *want* IPMI 2.0 with
>> KVM over IP.  You *want* switched remotely accessible PDUs, for those times
>> when IPMI itself gets wedged (rarer these days, but it does still happen).
>>  IMO you *want* this IPMI on a separate network. You *want* a serial
>> concentrator type system to provide a redundant path in the event of an IPMI
>> failure.  Problems don't go away just because IPMI stopped working.  You
>> *need* an inexpensive crash cart that just works, and plugs into your PDUs.
> I see, thanks for disabusing me of my notion of "ipmi" as one
> monolithic all-or-none creature. From what you write (and my online
> reading) it seems there are several discrete parts:
> IMPI 2.0
> switched remotely accessible PDUs
> "serial concentrator type system "
> Correct me if I am wrong but these are all "options" and varying
> vendors and implementations  will offer parts or all or none of these?


> Or is it that when one says "IPMI 2" it includes all these features. I

IPMI 2.0 includes
	* local power control (on-off switch in software)
	* Serial-over-lan
	* system sensor inspection

It *may* contain kvm over IP (the clusters we build do).

> did read online but these implementation seem vendor specific so its
> hard to translate jargon across vendors. e.g. for Dell they are called
> DRAC's etc.

IPMI 2.0 at minimum is a must.  DRAC has levels which also provide kvm 
over IP, though at additional cost.

> Finally, what's  a"serial concentrator"? Isn't that the same as the
> SOL that Skylar was explaining to me? Or is that something different
> too?

Something different.  A serial concentrator is a machine you can ssh 
into providing N serial ports.  It is different than the IPMI SOL 
capability.  It is a second non-IPMI management channel.  For large 
systems, I'd recommend multiple administrative paths ...


