[Beowulf] non-proprietary IPMI card?
bill at cse.ucdavis.edu
Wed Nov 29 01:13:27 PST 2006
Michael Huntingdon wrote:
> When comparing cluster offerings, seems reasonable, that the additional
> $85-$100 would be factored in to any system/cluster purchase, for at
> least power up/down and reset?
Many? Yes. All, not necessarily. In some cases this extra cost is a deal
killer. Keep in mind that it's part of the IPMI cost, usually you need a port
on a switch, another U, another cat 5 cable, a server motherboard (instead of
a desktop), a server CPU (instead of a desktop cpu), a server power supply, etc.
Sure, sometimes it's worth it, sometimes it's not. In any case it's
hard to do the calculation unless you have the costs (in $, watts, inches,
effects on uptime, and man hours).
> This is astonishing, or is there
> something I'm missing in this thread? The technology mentioned isn't
> really earth shattering.
Sure, just seemed like a cool board, say a $500 dual core node and you
don't want to wait for a few months for the tier-1's to productize (and
increase the margin of) the rumored AMD cpus allegedly coming out on Dec 5th.
It's been awhile since I built a custom pile of PCs, but they have
>> Additional system monitoring...like...internal temp...fan
> speed.....cpu/mem/disks, down to rotation rates is readily available.
> There is a very nice technology built into HP DLxxx systems that
> provides pre-failure analysis to system managers. Actually, from what
Not sure I see the benefit of pre-failure analysis, after all before
the failure you can use smartd, lmsensors, and related in band data
collection utilities. Especially since the collecting and management
of said data is becoming so much easier with ganglia, cacti, and related
There are definitely cheaper ways to get on/off/reset than servers,
IPMI/server management cards, second management network, and related glue.
Take the various network switchable PDUs. Usually $25 a node over
the metered PDUs (you do monitor circuit loading right?) or so,
and only requires a single network connection for each 16-21 machines.
The PDUs also have the benefit of handling all power, management and
switching exactly the same for every device you have. Servers, Disk
arrays, GigE switches, Infiniband or Myrinet switches, Consoles, fans, tape
drives, KVM, security cameras, etc.
I'm not saying I don't appreciate built in management processors,
out of band IPMI, and related technologies, again you have to be
informed to make the best decisions.
> I've seen and we've discussed, the goal is to extend the number of
> systems managed per system administrator to higher and higher numbers.
Indeed, but as mentioned above, sometimes the low tech, cheaper, and
universal approach (like the PDUs) saves more time/effort than the
elegant more functional approach that only handles 90% of the problem
(like out of band IPMI solutions I've seen that only handled compute nodes).
Especially if said management system only works with a single vendors
> So if it's possible to extend the number of systems per system
> administrator, how about extending the number of systems per cabinet,
Great, just don't increase the power/heat density.
> and the number of cabinets per system administrator. I don't mean to
> minimize the job responsibilities of system managers. Quite the
Heh, I've no shortage of work, feel free to make my job easier.
> opposite, which further exacerbates the notion that in addition to the
> $85-$100....you might somehow have the desire to become a custom cable
> specialist? *BLOODY BRILLIANT....
I've hand made 12" cat 5 cables to build a daisy chained management network
(v20z's if you are curious), it saved me a switch and rack space I didn't
have. Sometimes it's worth it. Sure I specify the elegant solution when I
can and it's worth the cost... but I've definitely seen many cluster proposals
from the tier-1 vendors that offered 1/2 the node count for the smaller
cluster vendors with a functional but not as fancy management system. Granted
it had a fancy management network, but at the end of the day twice as many
nodes and the ability to turn on/off any node was plenty. While my time isn't
free, it wasn't worth another $0.8M to get out of band fan RPMs and temps.
I'm all for better management technologies that save me time and work on every
server and compute node I have, as long as when all costs are considered they
are a net win.
More information about the Beowulf