[Beowulf] recommendation on crash cart for a cluster room: full cluster KVM is not an option I suppose?
Joe Landman
landman at scalableinformatics.com
Sat Oct 3 09:54:57 PDT 2009
Rahul Nabar wrote:
> On Fri, Oct 2, 2009 at 10:13 PM, Skylar Thompson <skylar at cs.earlham.edu> wrote:
>> Rahul Nabar wrote:
>>> On Wed, Sep 30, 2009 at 9:09 AM, Joe Landman
>>> <landman at scalableinformatics.com> wrote:
>
>
>> In addition to the console, the other really useful feature of IPMI is
>> remote power cycling. That's useful when the console itself is totally
>> wedged.
>>
>
> True. That's a useful feature. But that "could" be done by sending
> "magic packets" to a eth card as well, right? I say "can" because I
> don't have that running on all my servers but had toyed with that on
> some. I guess, just many ways of doing the same thing.
Hmmm...
If I were building a cluster of anything more than 4 machines (not
racks, machines), I would be insisting upon IPMI 2.0 with a working SOL
and kvm over IP capability built in.
For the 250-300 machine system you are looking at, you *want* IPMI 2.0
with KVM over IP. You *want* switched remotely accessible PDUs, for
those times when IPMI itself gets wedged (rarer these days, but it does
still happen). IMO you *want* this IPMI on a separate network. You
*want* a serial concentrator type system to provide a redundant path in
the event of an IPMI failure. Problems don't go away just because IPMI
stopped working. You *need* an inexpensive crash cart that just works,
and plugs into your PDUs.
Understand that administration time could scale linearly with the number
of nodes if you are not careful, so you want to (carefully) use tools
which significantly help reduce administrative load. IPMI 2.0 is one
such tool.
Sending "magic" bytes to an eth won't work if the OS/machine is wedged.
You are (likely) thinking of power-on when traffic shows up on LAN.
This is a very different beast.
If you could simply toggle power state of a server by sending "magic
bytes to the eth port, lots of people would be very unhappy from the
never ending denial of service attack this opens up.
Take it as a given that you want functional IPMI 2.0 with operational
SOL, and you really do want remote kvm over IP built in. The latter is
my opinion, but it is again based on experience over the last decade+ in
building/supporting these things.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list