[Beowulf] recommendation on crash cart for a cluster room: full cluster KVM is not an option I suppose?
landman at scalableinformatics.com
Wed Sep 30 07:09:23 PDT 2009
Rahul Nabar wrote:
> On Wed, Sep 30, 2009 at 8:19 AM, Hearns, John <john.hearns at mclaren.com> wrote:
>> It depends. Supermicro use the shared-socket approach (actually it is a bridge
>> somewhere on the motherboard), or with Supermicro you can have a separate
>> socket using a little cable with a minu-USB connector onto the IPMI card.
>> Other manufacturers use (a) or (b).
>> On a blade setup the IPMI is carried over the backplane Ethernet links.
>> If you have a separate IPMI network (ILOM, DRAC, whatever they call it) you
>> do not need the same type of switches. What you need is some cheap 10/100 switches,
>> one in each rack. Say Netgear or D-Link. Not a central switch with a huge backbone capacity.
>> Then you just connect the switches together in a loop.
> I like the shared socket approach. Building a separate IPMI network
> seems a lot of extra wiring to me. Admittedly the IPMI switches can be
Allow me to point out the contrary view.
After years of configuring and helping run/manage both, we recommend
strongly *against* the shared physical connector approach. The extra
cost/hassle of the extra cheap switch and wires is well worth the money.
Why do we take this view? Many reasons, but some of the bigger ones are
a) when the OS takes the port down, your IPMI no longer responds to arp
requests. Which means ping, and any other service (IPMI) will fail
without a continuous updating of the arp tables, or a forced hardwire of
those ips to those mac addresses.
b) IPMI stack bugs (what ... you haven't seen any? you must not be
using IPMI ...). My favorite in recent memory (over the last year) was
one where IPMI did some a DHCP and got itself wedged into a strange
state. To unwedge it, we had to disconnect the IPMI network port, issue
an mc reset cold, wait, and the plug it back in. Hard to do when the
eth0 and IPMI share the same port.
Of course I could also talk about the SOL (serial over lan) which didn't
Short version, we advise everyone, including some on this list, to
always use a second independent IPMI network. We make sure that anyone
insisting upon one really truly understands what they are in for.
I want to emphasize this. It is, in my opinion, one of the many false
savings you can make in cluster design, to pull out the extra switch and
wires for IPMI. Its false savings, in that you will likely eat up the
cost/effort difference between the two variants in terms of excess
labor, self-hair removal, ...
Really ... its not worth the pain. Go with two nets.
FWIW: most of the server class Supermicro boards (the Nehalems) now come
with IPMI and kvm over IP built in, on a separate NIC. Some do share
the NIC, we simply avoid using those boards in most cases.
Note also: for real lights out capability, we configure alternative
management paths. Again, it saves you time/effort/resources down the
road for a modest/minimal investment up front. Switched PDUs and a
serial port concentrator (or our management node with lots of serial
ports ...). It makes life *sooo* much better when "b" strikes, and you
need to de-wedgify a node or three, and you are too far to drive in.
There is lots to be said for real lights out capability. Park one crash
cart in a corner, and hope you will never have to use it.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf