Quick survey -- UPSs on slave nodes?
jeffrey.b.layton at lmco.com
Tue Feb 11 03:29:11 PST 2003
"Robert G. Brown" wrote:
> On Mon, 10 Feb 2003, Jeffrey B. Layton wrote:
> > Greg Lindahl wrote:
> > >On Mon, Feb 10, 2003 at 11:24:07AM -0800, Nicholas Webb wrote:
> > >
> > >
> > >
> > >>At the University of Idaho we are preparing to order a new beowulf cluster
> > >>and a vendor seemed to be shocked that we wanted UPSs attached to ALL of
> > >>the nodes.
> > >>
> > >>
> > >
> > >Most clusters I've worked on have had UPSes on all of the nodes. Among other
> > >reasons, it's an easy way to make sure your power is clean.
> > >
> > >greg
> > >
> > Greg has a point. However, for us, we would rather have more
> > nodes than UPSes. We problems that UPSes might not solve
> > (like maintenance guys who flick power switches on the main
> > panel or people who unplug power cords - UPSes don't help
> > too much in those cases :)
> > So far with a total of somewhere near 700 nodes in various
> > cluster, for about 2+ years, we haven't had any problems.
> It's really (surprise surprise:-) a cost benefit issue, so either answer
> could be right -- for you.
RGB is correct (with his comments below). I forgot to add that all of
our main server rooms have conditioned power (we've checked on
that one), UPSes for the room, and diesel backup for some of them.
I guess that's why we haven't had any trouble :) Seriously, we did
have one cluster in a room without conditioned power, UPS, or diesel
backup for about 24 months without any trouble. The only trouble we
had was that some numnuts decided it would be good to use raised
floor area for a conference room and the people in the conference
room decided it was too cold and turned off the AC! I can't tell you
how many times I heard the temperature alarm and had to run down
there and turn the AC back on! The last time I did it, I burst into the
conference room and yelled at a number of people for endangering
our $350,000 cluster and their lives.
> Benefits: Clean, conditioned power (leading to less hardware breakage
> and downtime). Cluster stability across "short" power outages (order of
> a second to perhaps a few minutes). Time for an orderly shutdown or
> emergency checkpoint?
> Costs: Money that could be spent on more nodes. Space that could be
> spent on more nodes. Higher consumption of energy and more heat to be
> removed. Possibly some toxic waste products associated with aged out
> UPS and batteries down the road.
> Adding a bit of complexity is that cheap UPS tend to have batteries that
> are a significant point of failure in their own right, usually right
> when you need them, and expensive UPS, well, aren't cheap. UPS can bump
> the per-node cost (hardware, heating, cooling, space) by 5-10%.
> If your computation fails if any node goes down (so that the cost of a
> node fault is "high") and/or if power in your area is known to be
> fault-prone with lots of short faults or spikes, UPS may increase node
> reliability and permit you to get more work done, on average, per dollar
> If you only lose the most recent work done by a single node if it goes
> down (so that the cost of a node fault is "low"), and/or if power is
> known to be reliable and clean, you may find that buying 5-10% (or so)
> more nodes is a better bet for getting the most work done.
> Note that a high cost for humans (systems administration or systems
> maintenance) also favors UPS, as they minimize the human costs of node
> failure and downtime even where the losses in terms of the computation
> aren't important.
> So dial your own comfort level.
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Lockheed-Martin Aeronautical Company - Marietta
Aerodynamics & CFD
"Is it possible to overclock a cattle prod?" - Irv Mullins
This email may contain confidential information. If you have received this
email in error, please delete it immediately, and inform me of the mistake by
return email. Any form of reproduction, or further dissemination of this
email is strictly prohibited. Also, please note that opinions expressed in
this email are those of the author, and are not necessarily those of the
More information about the Beowulf