[Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer
landman at scalableinformatics.com
Sun Sep 16 12:48:17 PDT 2012
On 09/16/2012 02:08 PM, Andrew Holway wrote:
>> With regards to risk perception, I am still blown away at some of the
>> conversations I have with prospective customers who, still to this day,
>> insist that "larger company == less risk". This is demonstrably false.
>> A company with open products (real open products), open software stacks,
>> ... that lowers risks. With the closed, vendor lock-in stacks, you
>> increase risk. And this is, perversely, what is called "lowering risk"
>> by people looking for an excuse to go with the larger company.
> This is also demonstrably false. Just because cluster vendor A is
> using a completely open source stack does not mean that you have any
> less risk then Cluster Vendor B with their proprietary closed source
Risk is a function of your control over the stack against small or large
change of business operations of one of the suppliers. If one of the
critical elements of your stack is completely closed, you have no
control over that aspect, and cannot change it out without incurring
great cost/time/effort, yes, that is, by definition, an increased risk
versus a functionally similar part (of similar operational level and
quality) which is completely open.
You said my thesis is demonstrably false, and I provided the simple
argument that supports my thesis. Your argument is ... what? You disagree?
> I have seen Rocks clusters that are an utter bag of shieße because the
> people deploying it had no clue and also seen Clusters based on Bright
> et al that were perfectly executed. And vice versa for that matter.
We understand that you are currently engaged in a Bright Cluster Manager
deployment. We are (see company info in .sig) , for the record, a
reseller (in the past anyway) of their tools (though we haven't sold any
for a number of reasons that I won't get into). Do you have a business
relationship with them? I see no problem advocating for them if you
disclose your interests, though Chris S/Doug E/... are the arbiters.
Rocks clusters are great first do-it-yourself clusters. Its a great way
to learn some of the basic things you need to worry about. I don't
necessarily consider Rocks to be a preferable kit, and with the
university copyright bit, anyone who wants to deploy it commercially
needs permission from the university to do so (and will owe some sort of
license fees for this).
The source is open, but, as a very long time user/deployer/supporter of
their kit, I can tell you that anaconda (upon which most of their work
is attached), is an insanely fragile and dangerous platform upon which
to develop. They've worked hard to work around issues, but
fundamentally, there are things so (completely) borked in anaconda, its
better to simply minimize your time in it at all costs and perform
installs after it completes.
Unfortunately, Rocks is so closely tied in to anaconda that defects and
design failures in the latter, negatively, IMO, impact the former. This
said, there are many Rocks users out there that don't need/want anything
more complex than what they have to offer. Some are on this list.
More to the point, this is a straw man anecdotal argument you make.
I've seen very crappy ... insanely crappy commercial code deployments
and excellent open source deployments in identical situations, and vice
versa. Doesn't mean much the way you've stated it.
Bright offers some cool features, and we thought we would use it with
our cluster customers. Alas, it did not support what our customers
requested, and Bright wasn't interested in adding it (which is fine,
they had perfectly good business reasons not to), so we used our own
tools to handle it. And for the record, our tools (Tiburon's core
functionality) is completely open. Its written in Perl, and if we were
hit by a bus in a physical or metaphorical sense, our customers could
continue to get support by paying someone to do this.
Could customers of Bright Cluster Manager get that same support if
Mattjis and team decided to become anglers for a living? No? Why not?
Doesn't this indicate ... risk?
This is not to diminsh Bright or Mattjis and team. The product is very
good, and if I didn't indicate agreement with your previous posts
extolling its virtues, then that was my omission. It is very good.
Point and clicky. A cli for those who want. Many good things built in.
But there is an inherent dependency upon a single company for a
critical function in a system. This is, potentially, a single point of
failure for the system should they decide to do something else.
FWIW: I normally recommend it to more Windows-y admin types who like
pointy-clicky for cluster admin. Painless setup for them. They are
deep in their comfort zone.
This is why there is such an active 3rd party market for some of our
(former) competitors old storage units ... parts really. The company no
longer makes them, no longer supplies parts, and they haven't quite
(yet) made the decision that their risk reduction involves kicking the
non-open units to the curb (that or, like many many who insist large
company == lower risk, they haven't quite internalized that it is almost
exactly the opposite that is true).
> The risk is with the people doing the deployment and the choices that
> the customer makes. It just takes some bad memory or a batch of bad
> motherboards and the whole project goes to crap as trust is lost.
No ... the risk is in the long term support. Deployment risk is fairly
trivial to manage for reasonable size installations, and the issues you
cite would, for a reputable vendor, never see the light of day in front
of the customer. The rack-n-stack shops don't do much of this
amelioration, but the people with a clue burn stuff in hard. The beat
the heck out of the machines before they ship, with the idea that if you
ship something that is known to work at the outset, it becomes *much*
easier to debug in the field.
We do this with our storage and, when we build them, clusters.
Customers occasionally yell at us over the additional delay we tell them
about after we encounter a failure under our load tests, but they get
good stuff that is known to work under fairly heavy loads.
> Please save these arguments for the Richard Stallman appreciation society.
And here I thought we were having a nice discussion, and you slip in a
little snide comment like this. Sad.
Just remember, archives are forever, and some of your potential future
customers/employers will be googling for you when you claim expertise in
some area or the other. Comments like this reveal something of you as a
person, add nothing to your reputation, and increase risk associated
with your "brand." You might want to consider that carefully before you
Lets keep the discussion reasonable and polite. Keep the S/N high.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf