[Beowulf] Re: Building new cluster - estimate (Ivan Oleynik)

Maurice Hilarius maurice at harddata.com
Fri Aug 1 08:41:31 PDT 2008


Mark Hahn wrote:
>> BTW< where a lot of people are jumping on the "Get IPMI " bandwagon, 
>> I suggest getting PDUs with remote IP controlled ports is more useful.
>
> the thing I don't like about controlled PDUs is that they're pretty
> harsh - don't you expect a higher failure rate of node PSUs if you go 
> yanking the power this way?
Why?
If nodes shutdown, on commands from the scheduler, that is good.
And, if they do not, how is cutting power by the PDU socket any 
different than a power switch on the node?
Obviously we want to avoid "dropping the hammer" on a mounted 
filesystem, at least until it has its cache
cleared. That is not hard to accomplish.
>
> I have only seen a handful of different IPMI interfaces, but they all
> were reasonably reliable.
>
I have used the Supermicro, Tyan, ASUS, and Dell, and they all had some 
tendency to choke sometimes.
The thing is, at the nominal cost of $50 to $100 per machine for BMC ( 
IPMI) cards, one can buy a couple of network controlled PDUs,
with the thermal and humidity sensors.
As you are likely to at least buy "dumb" PDUs, this means the typical 
cost per node added by this is usually around
$30 per node, resulting in a tidy savings.
It also means you are "talking" tp only one device pre 10 to 30 nodes, 
versus 10 to 30 BMC devices.

Further, these IPMI cards typically "steal" a GbE port on the nodes.


>> If you set your machines BIOS to start on power up, it is trivial to 
>> stop and start machines with the PD U power, and that is definitely 
>> reliable.
>
> huh?  we're talking about network-attached IPMI, which is fully 
> independent
> of the controlled motherboard's bios.  are you talking about those 
> hybrid systems where the IPMI controller shares an ethernet port with 
> the host?
> or IPMI through a kernel driver?
>
Either.
Most share a port, some have dedicated ports on board.

>> Plus , with a lot of those PDUs you can add thermal sensors and 
>> trigger power off on high temperature conditions.
>
> IPMI normally provides all the motherboard's sensors as well.  it 
> seems like those are far more relevant than the temp of the PDU...
I would rather monitor the room temperature at the racks, and shut the 
whole works down in a hurry if something is wrong, such as air 
conditioning failure.

> using lm_sensors is a poor substitute for IPMI.
Yes, and no.
For monitoring the temps and fans an such on nodes it is quite sufficient.
For power control it is useless, of course.


-- 
With our best regards,

//Maurice W. Hilarius         Telephone: 01-780-456-9771/
/Hard Data Ltd.                FAX:          01-780-456-9772/
/11060 - 166 Avenue         email:maurice at harddata.com/
/Edmonton, AB, Canada         http://www.harddata.com//
/     T5X 1Y3/
/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080801/2f04d789/attachment.html>


More information about the Beowulf mailing list