[Beowulf] Best Practices SOL vs Cyclades ACS

Marian Marinov mm at yuhu.biz
Fri Oct 9 22:33:23 PDT 2009

On Saturday 10 October 2009 08:09:45 Mark Hahn wrote:
> > We have more then 400 machines. Every month there is one machine that we
> > can not reboot using IPMI or the SOL is not working.
> we have something like 2500 nodes, mostly HP dl145g2's, and have a
> BMC-wedge probably 6-12 times/year.  can I ask what brand/model has such
> flakey IPMI? if you run "ipmi mc reset" on the node, does it resolve the
> problem? I wonder whether flakiness might also correspond to some config or
> usage pattern.  (ours dhcp from a local server - actually all the traffic
> is local.)

These are only Dell machines used for shared hosting. 

Usually these problem appear when there is DoS/DDoS or very high system 
resource usage(for example load over 100 on machine with 4 cores).

Our problem is that in such situations IPMI sometimes is unreliable as you can 
not connect on serial nor reboot the machine.

Best regards,
Marian Marinov

More information about the Beowulf mailing list