[Beowulf] how Google warps your brain

Robert G. Brown rgb at phy.duke.edu
Wed Oct 27 12:48:57 PDT 2010

On Thu, 21 Oct 2010, Jack Carrozzo wrote:

> To add my $0.02 to Bills points, it becomes more difficult also when dealing
> with multiple groups to decide on the type of setup and whatnot. 
> Where I went to school, the Math dept had a huge shared-memory SGI setup
> whilst the Physics department had a standard Beowulf cluster. Both groups
> used their systems rarely, and other departments had been asking for HPC
> hardware also. However, after long debates by all parties, a
> single infrastructure couldn't be decided upon and each independent dept
> just got a little money to fix up their curent systems.

It's not a trivial question.  I (when I do HPC at all) run trivially
parallel simulations that run for a long time independently and then
produce only a few numbers and could (and once upon a time, did) use
sneakernet for my IPCs and job control and still get excellent speedup.
Other people need bleeding edge networks in unusual topologies, or
enormous disk arrays with high speed access, or huge amounts of memory
(in any combination) in order to do their work.  Where I would care only
about building a bigger pile of fast enough big enough cheap PCs, they
would care about building a real beowulf with more (maybe even a lot
more) spent on memory and network and disk than on lots of cores at best
possible FLOPS/Dollar.  Then there are differences in utilization
patterns, robustness of the programs in a large cluster, how deep your
pockets are, how expensive systems management is, whether or not you can
DIY, whether the cluster you need require renovations such as devoted AC
and power and space or can stack up a pile of processors in a corner of
your office without either blowing a fuse or melting down, whether you
are a computer geek yourself or if you are slightly scared of actually
touching a mouse because it might bite.

I personally got into cluster computing because the large computing
center built at enormous expense by the state was -- "useless" to me
doesn't do it justice, doesn't convey the waste of money and resources
that my utilization of the resource even "for free" (with a small grant
of heavily shared time) represented.

Ultimately, it is as true today as it was then that YMMV, that people's
needs vary wildly (and the cluster architecture that represents CBA
optimum varies along with it), that optimizing all of this across a
large group of users doing very different kinds of work is more
difficult still (and introduces politics, new economic costs and
benefits, questions of the relative "value" of the research being done
by different groups, and much more into the not-terribly trivial
equations involved).

In the end, for many people I'm quite certain that the best possible
solution (from a CB point of view) is to build their own cluster for
their own use (this is almost a no-brainer if they can use 100% of the
duty cycle of any cluster in the Universe that they can afford in the
first place).  Sure, they may WANT to participate in a shared cluster
but if they do it is only to get the free cycles without any need or
intention of contributing free cycles of their own as their needs are
infinite, or any reasonable shared architecture cluster that will be
either too lacking in some key resource to be useful or will have too
much spent on the "standard" cluster nodes so that buying in wastes more
value than they gain relative to buying just what THEY need.

For others the opposite is true.  This is particularly true when there
are lots of people who ARE doing embarrassingly parallel computations
(say), especially ones where their work pattern represents only 20-30%
utilization.  Work very hard for three months, then do something else
for four (writing the papers, getting more experimental data, whatever).
Then sharing with other people with similar requirements can cut the
three months down to one, perhaps, if things work out just right.

So yup, YMMV.  Cluster "centers" done right can be great.  "Clouds"
of desktops, mediated by things like Condor, can be great.  Personal
clusters can be great.  Small/departmental clusters can be ideal,
especially if computational needs are homogeneous within the department
(but not between departments).  Where great in context means
"cost-benefit near-optimal given your resources and needs".  One size,
or architecture, does not fit all.

But "cluster computing", and the beowulfery studyied and discussed on
this list, has long had the virtue of cutting ACROSS the diversity, with
smart people sharing ideas and experiences that give you a decent chance
of putting together a >>good<< CBA solution if not a >>perfect<< one,
for a very wide range of possible tasks and attendant architectures and
political/economic resource constraints.


> -Jack
> On Thu, Oct 21, 2010 at 11:19 AM, Bill Rankin <Bill.Rankin at sas.com> wrote:
>       Good points by Jim, and while I generally try and avoid "me too"
>       posts, I just wanted to add my two cents.
>       In my previous life I worked on building a central HPC cluster
>       facility at Duke.  The single biggest impediment to creating
>       this resource was actually trying to justify its expense and put
>       a actual number on the cost savings of having a centrally
>       managed system.  This was extremely difficult to do given the
>       way the university tracked its infrastructure and IT costs.
>       If a research group bought a rack or two of nodes then they were
>       usually hosted in the local school/department facilities and
>       supported by local IT staff.  The cost of power/cooling and
>       staff time became part of a larger departmental budget and
>       effectively disappeared from the financial radar.  They were not
>       tracked at that level of granularity.  They were effectively
>       invisible.
>       Put all those systems together into a shared facility and all of
>       a sudden those costs become very visible.  You can track the
>       power and cooling costs.  You now have salaries for dedicated
>       IT/HPC staff.  And ultimately you have one person having to cut
>       some very large checks.  And because of the university funding
>       model and the associated politics it is extremely difficult, if
>       not impossible, to actually recoup funds from the departments or
>       even the research groups who would be saving money.
>       In order to make it work, you really need the senior leadership
>       of the university to commit to making central HPC infrastructure
>       an absolute requirement, and sticking to that commitment when it
>       comes budget time and the politics are running hot and heavy
>       over who gets how much.
>       Now to most of us this is a rehash of a conversation that we
>       have had often before.  And with clusters and HPC pretty much
>       established as a necessity for any major research university,
>       the development of central facilities would seem to be the
>       obvious solution.  I find it somewhat concerning that
>       institutions like Harvard are apparently still dealing with this
>       issue.
>       -bill
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list