[Beowulf] MS HPC... Oh dear...
Robert G. Brown
rgb at phy.duke.edu
Mon Jun 12 07:30:32 PDT 2006
On Mon, 12 Jun 2006, Geoff Jacobs wrote:
> Joe Landman wrote:
>> To a degree this is my point. Microsoft (I am not arguing their case,
>> just my impression of it) is going to try to make all this work out of
>> the box for you. It remains to be seen how well it works. I can't wait
>> for the first support calls about the ch_p3/p4 device though ....
> I can't wait for the Apple ad.
> They'll probably change them to what Windows users are used to. I've
> always found Microsoft's error codes entertaining - cryptic,
> uninformative, useless. An error code from a Microsoft product usually
> requires google to translate.
The issue here is strictly one of commercial software. I very much
doubt that you'll see many folks who roll their own parallel software
migrating in droves to MS clusters for precisely these reasons -- they
have to write their own code, debug their own code, and haven't the
time, the staff, the budget to deal with cryptic error codes and
commercial support mechanisms with built in obfuscation and delay.
Remember also that the "advantages" of being able to run MS-only are
largely illusory, since MCSEs are largely clueless about advanced
networking and parallel computation and parallel scaling and MPI and...
so WinXX clusters will face precisely the same support and programming
challenges that Linux clusters face without the rather huge base of
coders, the beowulf list for distributed (free) support, a wide
selection of consultants and turnkey vendors, magazine columns and
websites such as the Monkey. This is an "if we build it, they will
come" moment for MS.
So will they come? Companies that are trying to SELL ready-to-run
parallel applications will simply love this. It gives them a definitive
target platform to code for (unlike the plethora of rapidly evolving
linux based distros and their associated clusters). It is also an a
priori given that any customer who contemplates an MS cluster in the
first place for longer than thirty seconds (long enough to see the price
tag) is willing to spend money like water in order to run their mission
critical parallel application (whatever it might be). If you're
spending hundreds of dollars per node on the operating system, are you
likely to balk at the notion of spending hundreds of dollars per node on
the application software? Compared to linux -- the entire OS and all
these apps were FREE, right? One can install an entire cluster for the
cost of setting up a PXE server/repo, and even boot and run the whole
thing diskless, in as little as one day. Only people who really don't
give a rodent's furry behind about money will be willing to spend a ton
of money for a product where it takes days for your cluster-ignorant
MCSEs just to figure out the licensing arrangements for the nodes (and
to learn what a node IS).
So IMO the MS move is targeted at a very carefully specified market,
with little or no interest or hope of expanding beyond that market.
Their potential customers are those who will build large to very large
clusters to run very specific, commercially sold and supported software,
and will have very deep money-is-no-object-because-it-is-not-our-own
pockets. NIH comes immediately to mind, as does Hollywood (less so).
These are folks that are happy to spend money like water as long as
things "just work" according to their particular definition of the term,
which generally means "so that our team of six FTE MCSE admins can keep
it running transparent to us". They won't even care if things work
OPTIMALLY and many will not even understand how to assess things like
parallel scaling -- they just want them to work, and probably to dump
their results directly into Office Pro so they can make their power
point presentations with nifty excel figures without any intermediate
data import step or the hassle of managing two operating systems.
Will Microsoft's move succeed? Probably. If and only if they get
ENOUGH commercial software ready to plug and play AND somebody willing
to buy it, although they could lose money on this for years and still
fund it just to be able to claim market presence and not think twice
about it. IMO nobody is going to buy MS clusters to write their own
(real) parallel code unless they plan to sell it, though. MAYBE they
will find some entre into the grid (embarrassingly parallel) market
where people will be willing to write and compile code to run on a
MS-based grid for free; maybe they'll get some gaming companies to use
their platform to manage multitasking on a dual-core dual-cpu box to
improve game performance or the like (again on a shrink-wrap basis). I
don't think that they'll find a lot of takers for the system as a
research-level development platform if the real costs are passed back to
the users, though, both because those costs will be close to twice what
they are for linux clusters and because MS code development doesn't
really favor the unix-derived posix environment for moving data around
-- its biggest advantage is in visual/windowed stuff, e.g. VB. What
good is VB in a grid, where your application CANNOT do anything
whatsoever with a GUI?
So programmers will be reduced to writing in raw C, C++, Fortran,
POSIX-style code regardless. So do they really want to do this using a
compiler that costs a node (or two), an operating system that costs a
third of a node (per node), parallel extensions that cost even more (per
node), library issues, complex licensing arrangements (how IS Microsoft
going to control cost scaling inside the cluster, especially if the
cluster itself lives inside a firewall, hmmm?) and ALSO have to deal
with BSOD every time some exotic feature of their code that probes a
part that is NEVER explored in MS's more traditional applications
generates a terminal fault? With any POSSIBLE hope of real optimization
largely beyond their grasp, since the OS will absolutely hide all
details of the network interface that might be used to optimize (given
that permitting tuning here might destabilize the OS and increase BSOD
occurrences and give a bad perception of the stability of MS's product)?
Even those NIH folks are likely to start counting nodes vs cost at some
point -- getting a cluster 1/2 the size because you want to run MS
clustering software just won't cut it, I don't think.
So, naaaaa, not likely to be a popular development platform for real
researcher's writing their own code or using open source code.
So the real (rhetorical:-) question is: Who is writing the commercial
software that will run on this? and: Is anyone out there going to buy it
at rates that scale per node, given that one is basically SPENDING nodes
to get it? It isn't about the OS, it's about applications,
applications, applications. One has been able to develop and run
parallel applications on WinXX systems for a long time now, really --
pretty much as long as on linux if not longer. PVM for WinXX existed
back when Linux was but a gleam in Linus's eye. There is a REASON that
even way back then, when WinXX was much CHEAPER than commercial Unices,
it was a most unpopular platform for parallel code development, and that
reason hasn't really changed. If "enough" parallel killer apps are
written for MS clusters and "enough" people actually buy them, MS will
survive in this market. If not, hey, they probably won't lose money on
it given that any sales at all are likely to carry a high profit margin
relative to direct costs, and it isn't really that difficult to build a
"cluster" on top of ANY operating system that groks "network". The
clustering team is probably tiny and cheap, as MS project typically go,
and at the moment is probably spending more time ensuring that there are
commercial apps ready to go than they are ensuring that there is
anything particularly "cool" about the clustering environment itself.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf