[Beowulf] MS HPC... Oh dear...

Gardner Pomper gardner at networknow.org
Mon Jun 12 08:12:25 PDT 2006

Well, at the risk of the entire mailing list coming down on my, I must
disagree with the prevailing viewpoint.

I have been following this list for quite a while, not participating because
the vast majority are obviously working on much more advanced and detailed
projects than I am. This is, however, what I see as the major problem in the
HPC area. There is too much focus on the ultra-high performance libraries
and interconnects required for weather studies and other hugely complicated
parallelizable code. I think this has lead to a very narrow vision of the
use of parallel computing.

There are huge numbers of applications that could benefit from
parallelization at a task level. They may not be in the category of
"embarassingly" parallel, but they need significant computing resources and
they need to be solved quickly (in development time, that is). The
architecture of these is more distributed than parallel, in the classic
sense, but clusters should provide a VERY cost effective platform for this
gigabit ethernet and 4GB of RAM for $1000 each and hooking them all together
is perfectly adequate for this type of application. We don't need Myrinet,
etc, etc. Is there a reason that a rackmount case costs more than the PC I
put in it? Must a cluster cost  $3000/node, just because I am going to run
parallel software on it?

Most importantly for this thread, must I have a dedicated person to install,
configure and maintain a cluster? I am working at IBM and have managed to
get a small cluster (1 Bladecenter rack). We managed to hire a severely
geeky contractor for other work, but he has had to configure and tweak and
rebuild kernels and install the kitchen sink for months to get this to work.
It is now really cool, with parallel RAID for the cluster hard drives, etc,
but if he leaves, we have a boat anchor. We don't CARE about the last 10% of
performance. We will just buy another computer. We are a development house
running mathematical models that we split by decompositon, because we turn
out 3 models a month and dont have 5 years to write a parallel solver for
each. But, we DO run 50,000 of these models and aggregate them to get the
results we need.

I guess I have rambled a bit, but what attracted me to cluster computing was
the cost/benefit of it. That seems to be going by the wayside. It doesn't
seem possible to get a "cluster" for anywhere near the price of a bunch of
PCs. There is a HUGE amount of manpower that actually costs money to get it
to work. What HPC needs to expand its base is a simple, inexpensive, turnkey
cluster that you can just add PCs onto. I don't see any indication that it
is going that way. Until it does, HPC will remain the domain of universities
and research labs. This is a crying shame, because business could really put
that computing power to use if the cost of entry was not so high.

As to what this has to do with MS? Hopefully it will be a turnkey solution.
Probably too expensive, because of the existing market, but simple to get
working. No, the MCEs won't know about high speed interconnects and MPI
internals, but who cars? The business community won't. And there will go
another market that linux SHOULD have owned.

- Gardner Pomper

On 6/12/06, Robert G. Brown <rgb at phy.duke.edu> wrote:
> On Mon, 12 Jun 2006, Geoff Jacobs wrote:
> > Joe Landman wrote:
> >> To a degree this is my point.  Microsoft (I am not arguing their case,
> >> just my impression of it) is going to try to make all this work out of
> >> the box for you.  It remains to be seen how well it works.  I can't
> wait
> >> for the first support calls about the ch_p3/p4 device though ....
> > I can't wait for the Apple ad.
> >
> > They'll probably change them to what Windows users are used to. I've
> > always found Microsoft's error codes entertaining - cryptic,
> > uninformative, useless. An error code from a Microsoft product usually
> > requires google to translate.
> Sigh.
> The issue here is strictly one of commercial software.  I very much
> doubt that you'll see many folks who roll their own parallel software
> migrating in droves to MS clusters for precisely these reasons -- they
> have to write their own code, debug their own code, and haven't the
> time, the staff, the budget to deal with cryptic error codes and
> commercial support mechanisms with built in obfuscation and delay.
> Remember also that the "advantages" of being able to run MS-only are
> largely illusory, since MCSEs are largely clueless about advanced
> networking and parallel computation and parallel scaling and MPI and...
> so WinXX clusters will face precisely the same support and programming
> challenges that Linux clusters face without the rather huge base of
> coders, the beowulf list for distributed (free) support, a wide
> selection of consultants and turnkey vendors, magazine columns and
> websites such as the Monkey.  This is an "if we build it, they will
> come" moment for MS.
> So will they come?  Companies that are trying to SELL ready-to-run
> parallel applications will simply love this.  It gives them a definitive
> target platform to code for (unlike the plethora of rapidly evolving
> linux based distros and their associated clusters).  It is also an a
> priori given that any customer who contemplates an MS cluster in the
> first place for longer than thirty seconds (long enough to see the price
> tag) is willing to spend money like water in order to run their mission
> critical parallel application (whatever it might be).  If you're
> spending hundreds of dollars per node on the operating system, are you
> likely to balk at the notion of spending hundreds of dollars per node on
> the application software?  Compared to linux -- the entire OS and all
> these apps were FREE, right?  One can install an entire cluster for the
> cost of setting up a PXE server/repo, and even boot and run the whole
> thing diskless, in as little as one day.  Only people who really don't
> give a rodent's furry behind about money will be willing to spend a ton
> of money for a product where it takes days for your cluster-ignorant
> MCSEs just to figure out the licensing arrangements for the nodes (and
> to learn what a node IS).
> So IMO the MS move is targeted at a very carefully specified market,
> with little or no interest or hope of expanding beyond that market.
> Their potential customers are those who will build large to very large
> clusters to run very specific, commercially sold and supported software,
> and will have very deep money-is-no-object-because-it-is-not-our-own
> pockets.  NIH comes immediately to mind, as does Hollywood (less so).
> These are folks that are happy to spend money like water as long as
> things "just work" according to their particular definition of the term,
> which generally means "so that our team of six FTE MCSE admins can keep
> it running transparent to us".  They won't even care if things work
> OPTIMALLY and many will not even understand how to assess things like
> parallel scaling -- they just want them to work, and probably to dump
> their results directly into Office Pro so they can make their power
> point presentations with nifty excel figures without any intermediate
> data import step or the hassle of managing two operating systems.
> Will Microsoft's move succeed?  Probably.  If and only if they get
> ENOUGH commercial software ready to plug and play AND somebody willing
> to buy it, although they could lose money on this for years and still
> fund it just to be able to claim market presence and not think twice
> about it.  IMO nobody is going to buy MS clusters to write their own
> (real) parallel code unless they plan to sell it, though. MAYBE they
> will find some entre into the grid (embarrassingly parallel) market
> where people will be willing to write and compile code to run on a
> MS-based grid for free; maybe they'll get some gaming companies to use
> their platform to manage multitasking on a dual-core dual-cpu box to
> improve game performance or the like (again on a shrink-wrap basis).  I
> don't think that they'll find a lot of takers for the system as a
> research-level development platform if the real costs are passed back to
> the users, though, both because those costs will be close to twice what
> they are for linux clusters and because MS code development doesn't
> really favor the unix-derived posix environment for moving data around
> -- its biggest advantage is in visual/windowed stuff, e.g. VB.  What
> good is VB in a grid, where your application CANNOT do anything
> whatsoever with a GUI?
> So programmers will be reduced to writing in raw C, C++, Fortran,
> POSIX-style code regardless.  So do they really want to do this using a
> compiler that costs a node (or two), an operating system that costs a
> third of a node (per node), parallel extensions that cost even more (per
> node), library issues, complex licensing arrangements (how IS Microsoft
> going to control cost scaling inside the cluster, especially if the
> cluster itself lives inside a firewall, hmmm?) and ALSO have to deal
> with BSOD every time some exotic feature of their code that probes a
> part that is NEVER explored in MS's more traditional applications
> generates a terminal fault?  With any POSSIBLE hope of real optimization
> largely beyond their grasp, since the OS will absolutely hide all
> details of the network interface that might be used to optimize (given
> that permitting tuning here might destabilize the OS and increase BSOD
> occurrences and give a bad perception of the stability of MS's product)?
> Even those NIH folks are likely to start counting nodes vs cost at some
> point -- getting a cluster 1/2 the size because you want to run MS
> clustering software just won't cut it, I don't think.
> So, naaaaa, not likely to be a popular development platform for real
> researcher's writing their own code or using open source code.
> Commercial only.
> So the real (rhetorical:-) question is:  Who is writing the commercial
> software that will run on this? and: Is anyone out there going to buy it
> at rates that scale per node, given that one is basically SPENDING nodes
> to get it?  It isn't about the OS, it's about applications,
> applications, applications.  One has been able to develop and run
> parallel applications on WinXX systems for a long time now, really --
> pretty much as long as on linux if not longer.  PVM for WinXX existed
> back when Linux was but a gleam in Linus's eye.  There is a REASON that
> even way back then, when WinXX was much CHEAPER than commercial Unices,
> it was a most unpopular platform for parallel code development, and that
> reason hasn't really changed.  If "enough" parallel killer apps are
> written for MS clusters and "enough" people actually buy them, MS will
> survive in this market.  If not, hey, they probably won't lose money on
> it given that any sales at all are likely to carry a high profit margin
> relative to direct costs, and it isn't really that difficult to build a
> "cluster" on top of ANY operating system that groks "network".  The
> clustering team is probably tiny and cheap, as MS project typically go,
> and at the moment is probably spending more time ensuring that there are
> commercial apps ready to go than they are ensuring that there is
> anything particularly "cool" about the clustering environment itself.
>      rgb
> --
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20060612/08352dd4/attachment.html>

More information about the Beowulf mailing list