Quick survey -- UPSs on slave nodes?

Robert G. Brown rgb at phy.duke.edu
Mon Feb 10 15:52:50 PST 2003

On Mon, 10 Feb 2003, Jeffrey B. Layton wrote:

> Greg Lindahl wrote:
> >On Mon, Feb 10, 2003 at 11:24:07AM -0800, Nicholas Webb wrote:
> >
> >  
> >
> >>At the University of Idaho we are preparing to order a new beowulf cluster
> >>and a vendor seemed to be shocked that we wanted UPSs attached to ALL of
> >>the nodes.
> >>    
> >>
> >
> >Most clusters I've worked on have had UPSes on all of the nodes. Among other
> >reasons, it's an easy way to make sure your power is clean.
> >
> >greg
> >
> Greg has a point. However, for us, we would rather have more
> nodes than UPSes. We problems that UPSes might not solve
> (like maintenance guys who flick power switches on the main
> panel or people who unplug power cords - UPSes don't help
> too much in those cases :)
> So far with a total of somewhere near 700 nodes in various
> cluster, for about 2+ years, we haven't had any problems.

It's really (surprise surprise:-) a cost benefit issue, so either answer
could be right -- for you.

Benefits:  Clean, conditioned power (leading to less hardware breakage
and downtime).  Cluster stability across "short" power outages (order of
a second to perhaps a few minutes).  Time for an orderly shutdown or
emergency checkpoint?

Costs: Money that could be spent on more nodes.  Space that could be
spent on more nodes.  Higher consumption of energy and more heat to be
removed.  Possibly some toxic waste products associated with aged out
UPS and batteries down the road.

Adding a bit of complexity is that cheap UPS tend to have batteries that
are a significant point of failure in their own right, usually right
when you need them, and expensive UPS, well, aren't cheap.  UPS can bump
the per-node cost (hardware, heating, cooling, space) by 5-10%.

If your computation fails if any node goes down (so that the cost of a
node fault is "high") and/or if power in your area is known to be
fault-prone with lots of short faults or spikes, UPS may increase node
reliability and permit you to get more work done, on average, per dollar

If you only lose the most recent work done by a single node if it goes
down (so that the cost of a node fault is "low"), and/or if power is
known to be reliable and clean, you may find that buying 5-10% (or so)
more nodes is a better bet for getting the most work done.

Note that a high cost for humans (systems administration or systems
maintenance) also favors UPS, as they minimize the human costs of node
failure and downtime even where the losses in terms of the computation
aren't important.

So dial your own comfort level.


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list