[Beowulf] A bit OT - scientific workstations - recommendations

Robert G. Brown rgb at phy.duke.edu
Sun Mar 5 05:36:46 PST 2006

On Fri, 3 Mar 2006, Joe Landman wrote:

> Douglas Eadline wrote:
>>> - 24/7/365 next day on site support
>> Let's consider this idea in light of commodity hardware.
>> (i.e. why I don't buy service contracts on light bulbs)
>> Assuming that you can ship a node back for no cost repair within a
>> warranty period, the question to ask is how many spare nodes
>> can I buy for the price of a service contract for all the nodes
>> in my cluster?
> We live in the era of disposable computing.
> If your business case demands that you have no down time, then you engineer 
> around that.  If it demands you minimize costs, you need to adjust your 
> expectations on what you will get for those costs.

Your metaphor is a bit strained, Doug.  Nodes more often resemble
employees more than they resemble light bulbs.  Losing an employee, or
even two, is rarely fatal to a business, but there are often highly
nonlinear costs associated with things like an "epidemic" that affects
your entire workforce.

So, the issue is do you get health insurance for your employees that
more or less guarantees that EVEN an epidemic will cost you only a day
or so of downtime and a minimal amount of personal stress, or do you
plan on "playing doctor" for your employees, one at a time, as they
become ill, living with whatever downtime that process generates and
paying the cost of their medicines out of pocket.

[Hey, at least it isn't a car metaphor;-)]

So Joe's observation is apropos.  You engineer for your own particular
perception of costs of downtime and willingness to accept risks,
INCLUDING the substantial cost of your own time screwing around with

Having lived through one "nightmare" cluster where every node had to be
reengineered on the fly, where every node had to have their bios
reflashed to become semistable, where every node had to be "fixed",
where every node had to have its cooling fans replaced -- I will not
willingly do that again.  We saved 10% or so on the purchase price, but
paid it out 10 times over in a mix of direct costs for parts and human

For a home cluster, a hobby cluster, a small prototyping cluster (<= 8
nodes) sure, working without a net is reasonable.  For a professional
grade production cluster you CAN consider doing it and might have an
entire career where it works for you.  Or you could be OH SO SORRY on
your very first time when the whole damn thing breaks and is out of its
paltry 90 day mfrs warranty (assuming that you get the cheapest of
commodity boxes while seeking to save maximal money:-).


> It all comes down to the major compromises that you need to engineer for.  Be 
> it a supercomputer, a cluster, an SMP, a desktop.

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list