[Beowulf] SC05 blogs and observations

Robert G. Brown rgb at phy.duke.edu
Wed Nov 23 06:17:06 PST 2005

On Tue, 22 Nov 2005, Srihari Angaluri wrote:

> Also, I don't think you need a big cluster to prove that you have cluster 
> software. And I think that's what Microsoft was trying to do - prove that

Really?  Dang.  I thought that was right in the charter.  "Must have at
least one big cluster running to prove that you have cluster software."
Since a SMALL cluster can be run by carrying a floppy disk from system
to system and starting an application off it (no kidding, been there,
done that, even got decent scaling given that it was embarrassingly
parallel).  To be blunt, real HPC is all about scaling, and to MARKET
scaling to a cynical and even hostile user base it really helps to have
a big cluster running perfectly (where "perfectly" is pretty modest:
without crashing every time you try to run a large distributed
application for the few days of the conference/expo) .

High Availability clustering, OTOH, is about convincing business people
that you can support a usable server farm model and is as much about
management interface and GUI as anything because "performance" is
typically a pretty coarse measure.  Even there it helps to have a
largish cluster at what IS, after all, a giant sales bash.

This said, what exactly is "cluster software"?  On most clusters, it is
PVM or MPI.  On a very few, it is raw sockets (few because as you say,
it is a lot easier to prototype and test distributed applications on top
of a parallel library than it is to use Sockets).  MPI almost certainly
wins out overall because of portability, both across clusters and across
the remaining big-iron hardware out there, and some of my friends would
hurt me if I didn't also acknowledge that MPI-2 is second generation
feature rich and efficient as well.

To this one might fairly add batch/queue manager(s) (SGE, PBS), cluster
monitoring software (ganglion, wulfware), cluster management
software/distros) (warewulf, OSCAR, ROCKS), and cluster "operating
systems" (Mosix, Scyld, bproc).  However, any of these are on a minority
of clusters individually, I think.  Many clusters are "just" a pile of
PCs with MPI or PVM running pretty much a single application.

Is "cluster software" the moral equivalent of perl with threads and ssh?
Well, yes, maybe, I did actually write a CWM article on using perl with
threads and ssh to distribute an EP task and collect results.  However,
I would never call perl "cluster software" just because it now has
internal thread support (permitting it to fork off and manage lots of
independent subtasks) and is adjoined with ssh (permitting it to
remotely execute those subtasks).

In a very deep sense, the only "cluster software" that matters was
invented by AT&T, UCB, and Sun Microsystems (and others too numerous to
name, such as DARPA and Al Gore:-) a long long time ago.  That would be
Unix itself, integrated with a real, scalable routable network.  It had
user-transparent timeslicing and multitasking, interactive control
features (a shell), and an open standards-based routed network that was
>>scalable<< to 2^31 hosts or so even on a flat network, far more with

So forgive me if I am more than a bit cynical about Microsoft's efforts
in this arena.  Of course they are all about making money and
maintaining their monopoly and all of that -- anything less would be an
affront to their shareholders, and FAIR competition is a good thing
anyway.  However, anyone who has actually lived through the last 25
years of computing is aware that "fair competition" is something of an
oxymoron in business in general and that using it in the same breath as
the word "Microsoft" is openly ironic, or perhaps sarcastic.

Let me be perfectly clear about this.  What we will NOT see is Microsoft
participating in the open cluster development process supported by this
list.  We will NOT see it contributed GPL tools written for
standards-based compilers and libraries.  We will NOT see a user-level
applications development environment.  We will NOT see free, we will NOT
see cheap, except possibly where they "inherit" and deploy existing Open
Source tools to get things started and gain market credibility.

What we will see (peering into my crystal ball) is Microsoft targeting a
very specific market -- very likely genetics research or some other
NIH-funded, deep pocketed group of researchers who are Unix averse
because they lack the skills and expertise to make a Unix-like system
work and want everything to run from a Windows GUI and to dump all its
output straight into e.g. Excel and Power Point.  Here they will go head
to head with Apple and turnkey linux/bsd cluster builders, but they have
a decent chance of winning the market AND MAKING MONEY FROM IT because
the problem is actually not that complex from an IPC point of view (and
is more closely related to e.g. HA computing than, say, CFD or Monte
Carlo) and because the people they will sell to will value a GUI and
ease of use as much as they will value the actual speed and efficiency
of the computations.

The other thing they will leverage -- unfortunately quite successfully
-- is their ability to deliver a kernel that is "happy" with
shrink-wrapped proprietary device drivers, and their ability to offer
lavish development support to makers of hardware devices to ensure that
they offer drivers that work with their setup.  I wouldn't be surprised
to see Microsoft buying out one or more makers of high end cluster
network hardware if they think they can get away with it on the
antitrust side -- something that at first they probably can (noting that
they already own and sell a variety of hardware devices and that
companies like Sun and Apple clearly indicate that there is no
FUNDAMENTAL problem with doing so).  This will permit them to have a bit
of fund jerking the card's ABI around in tune with THEIR driver but at
the expense of any OS driver, once it has anything like a broad market.
Or is this too cynical a picture for even my cracked crystal ball?

Once they are entrenched in a real market with a real profit margin,
they will "invent" "new" cluster software to use on the back end of
their GUI -- something that is trivial to do, really, especially if you
are DELIBERATELY reinventing the wheel so you can give it a proprietary
API.  They will use their market presence to push their product into new
markets by giving it away (at first) and funding a few projects lavishly
to win a place on the top 10 of the top 500 (much as Apple did a few
years ago, and for the same reasons).  Then they will try to use their
new/proprietary back end and their "popular" cluster front end to pry
some 60% of the cluster market away from linux and other unices,
concentrating only on the profitable part -- the part susceptible to the
argument that paying $100/node/year plus $5000/cluster/year (or more) in
cluster-specific software support costs is a bargain as long as it means
that they can "control" their cluster from a really coolio control panel
at their desktop with their mouse without having to know how the damn
thing works.  They will (of course) include batch/queuing support,
policy support, and so much more directly in this interface and it will,
in fact, be fairly attractive.  Might even stimulate some related
efforts in the Open Source universe -- finally.

The "hard" 40% of the market they'll leave to Apple (to avoid antitrust
suits, at least for the time being) and linux in general.  In five years
if you look at any industry magazine catering to cluster users, it will
look like Microsoft "invented" the cluster and that any cluster that
isn't using Microsoft's clustering toolset (MCT) is bound to be slow,
expensive to manage and debug, and hence costly beyond measure in a CBA.

Mind you, this is their PLAN (I'm channelling somebody in their
marketing department, not their engineering department). I think that it
isn't quite half right.  I think that if they push very hard, they are
looking at 25-30% of the cluster market in five years time, and that
while they will make money there (by focussing on low-hanging fruit with
the highest potential profit margins and by inducing new customers who
wouldn't otherwise have done clustering to do a cluster) they won't make
MUCH money there because they'll find (to their chagrin) that several of
their assumptions above are incorrect -- cluster builders DO tend to
want to use standards based software, DO care about portability, most
DEFINITELY do care about stability and performance, and DO care about

Unlike the situation with Borland International (whom they crushed in
the software development market by a mix of underselling them and
controlling the operating system on top of which the integrated
compilers had to run) it is simply not possible to undersell a market
where any cluster builder has -- among many other choices -- not one but
MANY ways of building a cluster with zero software costs.  And we're not
talking shabby clusters, either -- clusters that e.g. install in a
matter of a day or so, PXE install or PXE boot diskless, clusters that
deliver bleeding edge performance with advanced networks and the ability
to pay for and run ONLY commercially developed cluster applications if
they pay for software at all.

Because of this I expect that they'll find the market tough going, and
it may even be a spectacular failure for them, one where they LOSE money
for years and years.  How many cluster builders buy RHEL or SuSE for
their cluster OS, when they can use Scientific Linux, Centos, FCX,
Debian, Mandriva -- for free?  How many clusters use Scyld given bproc
or OpenMosix for free, given warewulf or ROCKS?  How many clusters are
built by turnkey vendors rather than by the owners?  How DOES the cost
scaling work if you have to hire an MCSE from a highly competitive
market AND pay for Microsoft rather than hire a linux/cluster SE (also
from a highly competitive but less commercially lucrative market) but
get to use free software with no per-node scaling of software costs, or
if you compare turnkey clusters built on top of linux to turnkey (third
party) clusters built on top of Microsoft products?

Remember, per node costs EAT INTO THE TOTAL NUMBER OF NODES and hence to
total compute power in aggregate FLOPS.  For better or for worse,
aggregate FLOPS "sells" in an ignorant marketplace.  In order to "make
money", Microsoft is going to need to be paid some percentage of the
cost, per node, of a cluster.  That percentage will need to be LARGER
for small clusters than for big ones.  If we assume that they'll need at
least 5% of the per-node cost just to pay the marketing expenses of
their products (a fair guess, I think) then a MS-based cluster will
START OUT at 95% the maximum aggregate FLOPS of a linux or freebsd-based
cluster using one of the free distros or cluster packages.

There is market there.  Scyld has customers, turnkey vendors have
customers, some folks probably DO pay RHEL costs on clusters.  People
pay for commercial cluster applications, too, and those people will view
paying "more" for a commercial OS to run them on (but with some
interface bells and whistles) from a very different CBA perspective.
However, the market is "in irons" to borrow an expression from sailing
-- sailing into a wind, but without the forward momentum needed to
properly tack, so one constantly risks being pushed backwards or into
the reefs on either side.  Scyld sailors (oops, baadd pun:-) can perhaps
navigate these waters in a small and responsive craft.  The largest tall
ship in the Universe, however, offers a lot of surface area to the
unfavorable wind and while its hull is ironclad, the reefs are
unforgiving sharp teeth that lurk beneath the surface.

Or to speak less metaphorically, Microsoft has >>never<< tackled a low
margin marketplace without some form of leverage, and I have my doubts
about their ability to establish themselves in a marketplace where the
software margin is >>zero<< on, say, maybe 90% of all clusters.  It
leaves them arguing about opportunity costs and management costs versus
up front software costs, an argument they've been LOSING pretty steadily
in the server room with its ready supply of opportunity costs labor and
its intolerance of bullshit for years now.  I think they're gambling on
the existence of a "desktop clustering market" -- small to midsized
clusters run by single grant supported users from their plain old
desktop -- plus the ability to make enough money from a few larger
clusters to break even while they gain market visibility.  I don't think
they'll "lose" -- not immediately, anyway -- because they'll create new
clients in the process even where they fail to win old ones away from
linux.  But they won't win, either.  Not in HPC.


> they do have cluster software now, as opposed to talking in vacuum. From what 
> I heard, scalability and performance are not even in the picture yet because 
> this is the first version of the software, and not to mention what they 
> showed at SC was a beta. So, there was no need for a big rack there :) I 
> think in a way it better proves that they are more serious about this stuff, 
> rather than showing a big dead rack with one or two nodes running their 
> software really. Anyway, let me make my statements true by saying "I am 
> speculating." :) Heh, that was easy!
>> I had heard from some of the people at CTC that .NET was going to be very
>> useful with clusters. I think it can work for some applications, but of
>> course not all.
> Well, Java is equally easy for clusters too :) .NET is just another enabling 
> tool. It is interesting for Windows clusters because you can quickly 
> prototype and test distributed applications as opposed to writing apps in 
> traditional ways, like using Sockets. Plus, .NET is more friendly for Windows 
> because Windows Server 2003 and beyond have native .NET support in the OS. 
> However, .NET is not an MPI replacement (and frankly I don't think it ever 
> will be).
>> Thanks for your feedback.
> Thanks for having an open mind :)
> Srihari
>> --
>> Doug
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list