[Beowulf] The True Cost of HPC Cluster Ownership

Daniel Pfenniger Daniel.Pfenniger at unige.ch
Tue Aug 11 10:28:22 PDT 2009


Joe Landman wrote:
> Gerry Creager wrote:
>> Daniel Pfenniger wrote:
>>> Douglas Eadline wrote:
> 
> [...]
> 
>>> This article sounds unbalanced and self-serving.
>>
>> I thought it read a bit like a chronicle of my recent experiences.

Mine were not so bad, so I found the tone too pessimistic.

> I think that this article is fine, not unbalanced.  What I like to point 
> out to customers and partners is
> 
>     There is a cost to *EVERYTHING*

Well, not really surprising.  The point is to be quantitative,
not subjective (fear, etc.).  Each solution has a cost and alert
people will choose the best one for them, not for the vendor.
If many people choose IKEA furniture over traditional vendors
it is because the cost differential is favourable for them,
even taking all the overheads into account.

When commodity clusters came in the 90's the gain was easily a
factor 10 at purchase.  In my case the maintenance and licenses
costs of turn-key locked-in hardware added 20-25% of purchase
cost every year.  With such a high cost we could have hired an
engineer full-time instead, but it was not possible because of
the locked-in nature of such machines.  The self-made solution
was clearly the best.

Today one finds intermediate solutions where the hardware is
composed of compatible elements, and the software is open source.
Some vendors offer almost ready to run and tested hardware for
a reasonable margin, adding less than a factor 2 to the original
hardware cost, without horrendous maintenance fee and restrictive
license.  The locked-in effect is low, yet not completely zero.
This is probably the best solution for many budget-conscious
users.


> 
> Heinlein called it TANSTAAFL.  Every single decision you make carries 
> with it a set of costs.
> 
> What purchasing agents, looking at the absolute rock bottom prices do 
> not seem to grasp, is that those costs can *easily* swamp any purported 
> gains from a lower price, and raise the actual landed price, due to 
> expending valuable resource time (Gerry et al) for months on end working 
> to solve problems that *should* have been solved previously.
> 
> There is a cost to going cheap.  This cost is time, and loss of 
> productivity.  If your time (your students time) is free, and you don't 
> need to pay for consequences (loss of grants, loss of revenue, loss of 
> productivity, ...) in delayed delivery of results from computing or 
> storage systems, then, by all means, roll these things yourself, and 
> deal with the myriad of debugging issues in making the complex beasts 
> actually work.  You have hardware stack issues, software stack issues, 
> interaction issues, ...

You forget to mention that turn-key locked-in systems in my experience entail
inefficiency costs because the user cannot decide what to do when
completely ignoring what is going on.  Many problems may be solved in
minutes when the user controls the cluster, but may need days
or weeks for fixes from the vendor.  A balanced presentation should
weight all the aspects of running a cluster.

> 
> What I am saying is that Doug is onto something here.  It ain't easy. 
> Doug simply expressed that it isn't.

> As for the article being self serving?  I dunno, I don't think so.  Doug 
> runs a consultancy called Basement Supercomputing that provides services 
> for such folks.  I didn't see overt advertisements, or even, really, 
> covert "hire us" messages.  I think this was fine as a white paper, and 
> Doug did note that it started life as one.

You may have noticed that this article was originally written on demand
of SiCortex...


	Dan







More information about the Beowulf mailing list