[Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).

Robert G. Brown rgb at phy.duke.edu
Tue Jan 20 00:27:08 PST 2009


On Tue, 20 Jan 2009, Jon Aquilina wrote:

> in all honesty is there a need really to replace the cluster. why not just
> add onto it?

It depends on your resources.  You have to provide electricity to the
cluster nodes, cooling for them, space for them, and they cost money to
maintain and administer.  If one assumes a cost per node of (say) $2000
and a cost of electricity etc per node of $200/year -- not insane
numbers -- and then count out the years with an 18 month presumed
doubling time of performance, then at the end of three years performance
is 4x, at the end of five years performance is at least 8x year 1.  At
that point one is paying $200/year for 8 nodes ($1600) where for $400
extra you could replace them with a single node that ALSO very likely
costs you $200/year, so that you break just about perfectly even, except
that you really win because your old nodes are old enough to start
breaking, use up 8x as much space, because Amdahl's law favors fewer
faster systems in most task scalings, because you reset your warranty,
because (more) modern hardware typically has other advantages.

If somebody else pays the power bill, if you've got plenty of empty rack
space and can afford the power bill for still more nodes, if your nodes
are extremely reliable and you are a linux god and have the systems
configured so that human maintenance costs are nearly zero, then sure,
running the nodes past this point is fine and more beneficial than
throwing them away.  If you have fixed rack space and cooling and power
capacity, if you pay your own power bill, if you have "poor" hardware
that is starting to have lots of little niggling problems leading to
downtime and human effort being wasted, if you have to upgrade linux to
get it to run on your newest nodes but your oldest nodes don't have
enough memory to hold your computations and a modern kernel and you'd be
forced to run an OS heterogeneous cluster throwing them away (or
donating them to the poor -- e.g. a nearby high school) is a better
choice.

And no matter what, by year 6 if you DON'T replace them then SOMEBODY
will be paying more for electricity alone for the cluster than it would
cost to replace the entire cluster with 1/16th as many machines that
would get more work done, faster, for less.

Hence the assertions below.  If you run your cluster in a relatively
"expensive" environment with high fixed costs per node per year (for
example if your nodes are sited in a commercial cluster room, where they
are likely to charge you a LOT more than $200 per node per year) then
your ideal replacement is likely to be a lot less than 5 years.  If your
work is valuable, your replacement cycle is likely optimal at less than
5 years.  If your time is valuable and reliability is important,
replacement cycles that match your hardware maintenance agreement aren't
crazy.  But no matter what, by 6 years your nodes need to be taken out
and shot UNLESS you're just running a training cluster, not trying to
get work done.  For a high school that's great.  For somebody doing
regular computations, at some point in there an $800 Wal Mart Blue Light
Special can outcompute your whole 16 node cluster.  I've lived through
several turns of this particular wheel, and at some point nodes aren't
really worth turning on.

Strange, really, but there it is.  I'm fortunate to be at a University
which DOES pay for the power, and have run nodes out past five years
several times, but yeah, I've also spent a tiny bit of money and bought
desktops (or even laptops) that were faster than all those old nodes
working together.

    rgb

> 
> On Mon, Jan 19, 2009 at 2:40 AM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>       On Sun, 18 Jan 2009, Greg Lindahl wrote:
>
>             On Fri, Jan 16, 2009 at 08:10:43PM -0500, Mark Hahn
>             wrote:
>                         The question was raised as
>                         "When should all these
>                         servers be upgraded or
>                         replaced again?"
> 
>
>                   3-5 years, IMO.  if you replace hardware
>                   in <3 years, you're obviously
>                   burning money.
> 
>
>             After factoring in rent and utilities, my
>             replacement time is < 3
>             years. So what's obvious to you doesn't seem very
>             obvious to me!
> 
> 
> Right.  There are rules of thumb, but they are based on assumptions,
> and
> if the assumptions aren't applicable they will lead to non-optimal
> behavior.  The right rule is "do the cost-benefit analysis and act
> according to what it tells you".  Which will, in fact, often lead to
> 3-5
> year replacement cycle.  But additional costs along the way alter the
> landscape, shifting to different replacement cycles.
> 
> To do the computation correctly, you have to include all sorts of
> marginal costs and benefits.  For example, sometimes there are
> nonlinear
> benefits to finishing work faster, which favors shorter cycles.
> Sometimes there are nonlinear costs (or higher than normal linear
> costs, which ALSO favors shorter cycles.
> 
> On the other hand, if somebody else pays for the power, and you have
> no
> source of money to buy replacement nodes, you run nodes until they
> die.
> All the "usual" CBAs assume a constant flow of support moneys for new
> nodes, and that is not always the case.
> 
>  rgb
> 
>
>       -- greg
> 
> 
>
>       _______________________________________________
>       Beowulf mailing list, Beowulf at beowulf.org
>       To change your subscription (digest mode or unsubscribe)
>       visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> 
> 
> --
> Jonathan Aquilina
> 
>

Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


More information about the Beowulf mailing list