Why not NT clusters? Need arguments.

Robert G. Brown rgb at phy.duke.edu
Fri Oct 6 16:30:48 PDT 2000


On Fri, 6 Oct 2000, Schilling, Richard wrote:

> Nice!  This is the type of thing that corporate types need to hear.  It is a
> difficult task to try and convince many managers/supervisors why they should
> steer away from NT.  Microsoft, although they have come up short on
> enterprise-grade "clusterable" machines, has done a great job of convincing
> many execs that NT is "good enough", and attainable.  Convincing them
> otherwise is what you're most likely up against here.
> 
> Great data and anecdotes is what it's going to take. . . .
> 
> 
> Richard Schilling
> Lake Stevens, WA

You have to be careful here.  I know that you understand the implied
meaning of "clusterable" because you've been on the list a fair while
but I'm a bit uncomfortable with the use of the word "cluster" as it has
been used in the thread so far.  As Greg Lindahl (IIRC) pointed out in a
recent relevant thread, "clustering" means different things to different
people.  NT does support some very (again anecdotally, as I personally
would rather have my teeth drilled with an almost discharged
Black&Decker rechargable drill and a dull bit than work with any MS
product including NT) nicely implemented failover solutions -- the kind
of application where you can run the application in a distributed/load
balanced mode and yank the plug on a box and have the application keep
ticking.

This is very different from the kind of parallel application I was
describing (with Greg Warnes inestimable detailed quantitative help:-)
in my previous response, the kind that is generally discussed on this
list.  It is still useful to differentiate the two in discussions with
corporate managers to avoid being trumped by MS sales reps.  If your
job/application is computationally intensive, moderately tightly coupled
(enough so that node failure causes failure of the job) and distributed,
then stability becomes a critical issue unless you REALLY spend
money/time to make your application robust to node failure.  If your
application is something like a distributed web server or DB server or
transaction processor, the SOFTWARE is often loosely coupled and written
to be robust against node failure (however expensive and difficult that
was originally to accomplish).  There is NT software that is indeed
robust in this way.  

Linux is just beginning to come up with failover-toughened OS variants
(e.g. Turbolinux and some variants of Red Hat) and associated
applications.  There's money there (Turbolinux is easily one of the most
expensive box-set linuxes) and there will be more there, but this isn't
really a traditional forte of Linux.  Part of this is pure economics.
Failover robustness is a pain in the ass to accomplish and it requires a
lot of real work and investment to accomplish it.  Folks don't even try
unless real money is at stake, and up to a couple of years ago there was
little or no "real money" in Linux.  This is also (incidentally) one of
the differentiating features of Extreme Linux vs Beowulfery.  Extreme
includes both beowulfery and failover clustering and other exotic
clustering or non-clustering applications of linux and exists in part to
foster the development of those applications.  Beowulfery, as Greg L.
recently pointed out, is basically high >>performance<< (as opposed to
high reliability) computing on COTS Linux clusters.  It is thus the
best-known subset of the Extreme, but the two are not identical.

A useful rule of thumb is that if you are rolling the parallel/cluster
application yourself it is almost certainly not failover robust (unless
you work quite hard and expensively to make it so) and NT will be a poor
choice if it also runs for long times.  Actually, NT is probably a poor
choice for lots of reasons, only one of which is the robustness of the
application.  I personally would not like to develop a parallel
application on an NT cluster because all my favorite tools are missing
and all the tools available cost a lot of money and are highly
nonstandard (unless you view MS's efforts in code development software
"standard").

If somebody else wrote it and made it robust against node failure, then
it is a pure cost-benefit issue.  If you have a choice (e.g. the
application is available for both linux and NT) compare the costs of the
application, the OS, and the admin staff for the two choices, since the
failover engineering insulates you from NT's instability (in principle).
You'll still almost always find the linux solution to be the cheaper one
if it exists.

In other cases (where maybe somebody else wrote it, but it isn't robust
against node failure) you have to think a bit, but linux is still likely
to be the best choice if a linux-based version of the application is
available, because literally everything is cheaper and more stable with
linux than with NT.  

We seem to find a theme here that is worth repeating -- Linux is across
the board far cheaper than NT and is generally far more stable and
robust at the OS level.  Only if a particular clustering application is
available only for NT, or if an organization is rich in NT experts so
that there are really significant personnel costs associated with
conversion is a deliberate selection of NT over Linux still worthwhile.

This can happen and will continue to happen as long as there are niche
products that "only" run on NT and as long as there are organizations
with a stong core NT staff (and significant capital investment in those
individuals).

Also, much as I personally dislike MS, there are a few arenas where
their products enjoy a decent reputation that isn't wholly ill-deserved.
They just tend to be expensive (and hence cost-benefit losers) and very,
very proprietary so that committing to them is like getting married to
somebody glamourous and expensive to keep that you aren't at all sure
that you really like.  Sure they're attractive and even good in bed, but
somehow you know that eventually, you're gonna pay for it...

   rgb

[P.C.-P.S.: Please note that the previous analogy, however tasteless,
wasn't a <it>sexist</it> analogy.  I didn't specify whether the
individual(s) involved were male, female, neither or both.  I think that
the experience of marrying the personality disordered but momentarily
"beautiful/handsome/available" when one doesn't really like them and
eventually discovering one's horrible mistake probably occurs on the
planet of a distant star where the creatures that are mating have six
sexes.  Or even just one, in the case of a highly narcissistic (but
evolved) species of yeast...;-)]

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list