[Beowulf] MS HPC... Oh dear...

Robert G. Brown rgb at phy.duke.edu
Mon Jun 12 10:28:20 PDT 2006


On Mon, 12 Jun 2006, Joe Landman wrote:

>> In the best of all universes, this is true.  But in the best of all
>> universes, I'd have naked slave girls fanning me and periodically
>> stuffing grapes in my mouth while I type, right?
>
> Hmmm.... might your significant other have something to say about this?

Significant others, plural, you mean.  In my harem.  Houris, one and
all.

Besides, it is perfectly clear on purely logical grounds that even a
singular significant other wouldn't have anything to say about this in
the best of all universes.  Saying something about it would obviously
contradict the premise.  No, in the best universe she'd help me pick the
most beautiful and compliant slave girls and feed me viagra on
demand...;-)

(Kidding, just kidding, oh no honey don't <ouch> seriously <ouch> I was
just <ouch> kidding....;-)

> The problem is that (for commercial software) you now have to
> build/support N different distros.  And the good folks at Microsoft are
> going to make that point to the world, to the ISVs, etc.

This is why God invented rpmbuild --rebuild, and the fact that it
sometimes fails (and debian) are why God is slowly, painfully, trying to
get around to implementing a fully binary compatibility layer for linux.
But God is in no hurry.

The reason it is taking God so long a time is because God is of two
minds on this -- monolithic compatibility is simultaneously a boon to
people who want everything to "just work" and be easy, and a curse to
people who want things to be able to actually change and get better.
Hence compromise, bet-hedging, religious wars, RHEL vs FC vs Ubuntu vs
Slackware vs Debian (and I'm not omitting YOUR personal favorite out of
malice, only because there are so many choices these days).  Evolution
in action, not a monolithic Creator building "perfect" things that
never change.

RHEL vs FC vs Rawhide is perhaps the cleanest in-line spectrum to make
this clear.  RHEL is slow and conservative and will be supported for
eternity, and this makes certain developers and implementers as happy as
it makes others very sad.  I have a whole bunch of hardware in my house
that Will Not Run FC4 and/or Centos 4 -- too recent, no support, sorry.
The package that I rely on "the most" on my laptop -- NetworkManager --
is badly broken and unfixable (literally, wrong dbus support) on FC4 but
works pretty well and is improving rapidly under FC5.  Rawhide and the
actual developer's personal systems, well...

HOWEVER, it is really pretty easy to write software that will build in a
distribution-agnostic way for NON-bleeding edge things like numerics and
cluster apps in linux, and not too bad to do really complex graphical
applications that are cross distro portable as well.  Abomination from
hell or not (religious issue), Gnu's automake/autoconf do "work", for
example, and are better than other abominations from hell of years past
(such as aimk or imake) that serve the same purpose.

Remember that nearly all the software that runs on nearly all the
distros IS in fact shared between all the distros.  That is, people do
this all the time, and it is so cheap that it doesn't generally require
professional coders at ALL to enable once the package is laid out in the
first place. ./configure --prefix=/usr;make (inserted inside your
favorite packaging template) -- who here doesn't know exactly how to do
this, and do it all the time? I would actually argue that it is by and
large CHEAPER in EVERY DIMENSION to build software that works across
"all of linux" than it is to build software that works across "all of
windows", an observation that is at least somewhat empirically supported
by the fact that much of it gets built and supported by part time labor
contributed by people with day jobs instead of a team of a half-dozen
full-time dedicated programmers.  Using free tools, by the way.

But this is only marginally relevant to my point -- that one difference
between DLLs and SOs is that Windows DLLs and the DLL install process is
unfortunately not terribly version aware and is easily, even commonly,
abused by commercial software vendors to the detriment of customers.  I
have heard it said (and have no reason to doubt) that a real pro-grade
MCSE can stabilize a windows network so it runs close to as well as a
linux network by the simple means of ruling software installs with an
iron fist -- basically never letting new DLLs from a new package onto
the system that break a package already on the system.  The problem is
that This Is Not Easy (just as it was once Very Difficult pre-yup for
RPM-based linux, where at least RPMs WERE dependency aware), and can
limit the software you permit to be installed pretty radically.  On
WinXX you CAN'T just do a rebuild of your package with its ancient,
incompatible DLL so that it uses the nice new one (porting/updating code
as needed), can you?

So it is really a two edged sword here as well.  Software vendors as
always face the same choice -- tie the entire maintenance/upgrade cycle
for their software into that of WinXX or "cheat" and distribute the DLL
that works with their old but fully paid-up version of the software and
keep reaping maximum marginal profits without additional investment (and
screw the people who break their systems installing it:-).  Perhaps DLLs
are in principle strongly versioned and there are rules and tools to
prevent dependency issues, but if so it isn't apparent from the DLL
names themselves and the ease with which e.g. trojanned DLLs can be
slipped into a WinXX system or the ease with which a WinXX system can be
broken by duelling DLLs.

> There is a question of practicality versus purity.  I believe you are
> arguing for purity (of distro and build), and I am arguing that this is
> not as practical as a well implemented ABI.

Au contraire, purity is absolutely practical.  That's why most distros
are run pure, why Debian is so popular, why yum has transformed
rpm-based linux, why Gentoo exists at all (if you think that it SHOULD
exist at all:-).  At least purity is practical if it is practical to be
stable, if it is practical for things not to break, on a system that has
NEVER STOPPED DEVELOPING from the day the very first linux-based systems
were brought up to the present.  What isn't practical is the nightmare
that preceded apt, yup, yum, the horror of /usr/local builds, the
mish-mosh of "installshield" driven installs that often as not required
a reboot to activate (what?  changed something?  reboot!) until only a
year or two ago.

Of course you DO say "well implemented ABI", making it difficult to
argue.  I will persevere.  The question is, is this equivalent to saying
"Citizens are happiest and most productive living under a benevolent
dictatorship"?  Or "pigs with wings are very beautiful and greatly to be
desired, but be sure to wear a raincoat..."?  Something that is at best
true but useless (because well-implemented ABIs are rare and tend --
appropriately -- to be narrow as well), at worst false and misleading
(because well-implemented ABIs that are NOT narrow become a
straightjacket to developers as the installed code base develops a huge
inertia based on the extremely high cost of change)?

>> Debian also plays nicely here -- dependency awareness, build
>> consistency, strong resistance to multiple libraries satisfying any
>> given shared dependency and dependency loops and so on.
>
> Which all but one or two commercial ISVs completely ignore as a target
> distro.  This may change with Ubuntu, but not likely by much.

Absolutely.  The point being that linux is not terribly commercial
software friendly, largely because the kind of people who WRITE
commercial software are most unlikely to understand how linux
development (commercial or not) has to work.  And of course it isn't
like they designed any part of it to BE friendly to commercial software
development -- quite the opposite, as often as not.

Seriously, pick almost any package available on almost any linux.  Does
it not run on all linuces?  Sure it does.  Often in more or less the
same revision (derived from the same CVS-stable branch head).  Often
prepackaged at the primary CVS tree so that one has at most to edit the
makefile or drop in the relevant packaging scripts and do a three minute
rebuild to get a package ready to go into any PURE distro, guaranteed to
be at least build clean and with a bug-tracking process in place to help
work out runtime problems.

ISVs just don't get this.  Screw the open source vs closed source issue.
I personally tend to be mostly pro-Open, but not to the point of
religious fanatacism (the economics of software development are too
complex for religious simplicity IMO) but it doesn't matter.  If one
develops one's software using the METHODS that everybody is using for
open source software on linux -- version control, autoconfiguration,
using the "right" distro libraries (namely those that permit non-viral
linking or roll your own) -- it is perfectly straightforward to build a
product that can be reasonably inexpensively rebuilt and maintained
across nearly all of the linux distros in "pure" form.  livna does this
now for repackaged BINARY stuff, for FREE.  Vendors spend more every day
on maintaining their WinXX code base against upgrades and library
problems, I'm sure, than they'd need to spend on maintaining pure
install-ready versions of their software that would load and run on at
least the primary flavors of linux.

>> WinDoze?  Don't make me laugh.  Anybody in the room who hasn't broken at
>> least one WinXX app by installing another (without the slightest warning
>> or complaint from the OS or install process) please raise their hand.
>> Hmmm, not a lot of hands out there....;-)
>
> I especially like the remove feature which occasionally takes down the
> OS by removing a critical .dll.  :(

If by occasionally you mean "regularly and with little warning" I agree.
Although XP has (finally) gotten a bit better in this regard.

Then there is the complement to this "feature" -- that the software you
are trying to remove that provides the "critical" DLL is some piece of
commercial crap-ware like AOL that will dun you forever and a day
exhorting you to join AOL and reap its many benefits.  Software that
installs itself so deeply and insidiously that it will just replace its
icon on the desktop no matter how many times you remove it.  I even
discovered a version of AOL that was "remove-proof" -- windows simply
refused to remove it, or it would SAY it was going to remove it but it
lied, or it regenerated, or something.  I finally went and just deleted
AOL's entire directory tree by hand, risking (I'm sure) the utter
immolation of the system in question although it did seem (finally) to
work.

I mean, is Windows anything but a giant billboard for software on top of
a cheap binary loader of same?  Really?  I didn't think so.  Even my
wife (who doesn't care what she runs as long as it is easy) gets
irritated by the ads, by the damn paper clip and other popups, by the
fact that windows only "works" if it comes with Office and hence costs
at least $400 to $600 on a typical desktop that ITSELF might have cost
only $600 to $1000 and REQUIRES AV protection or you WILL be nibbled to
death by gnomes...

>> So FUNCTIONALLY sure, DLLs and SO are the same thing.  Practically
>> speaking, Wine and/or Cedega typically install applications (and all
>> their application-specific DLLs) in their own independent fake trees
>> just to avoid this very problem, because it is difficult enough to debug
>> a WinXX emulator as it is without having to also debug dueling DLLs.
>
> My arguments are that the Microsoft cluster bits will be important for
> people who dont care about the underlying technology, and simply want "a
> cluster".  Loaded statement granted, but Microsoft is not known for weak
> marketing, and they will go to town with perceived weaknesses, real,
> imagined, or concocted.

Oh I agree, I agree.  No argument.  Most of my first reply was half
tongue in cheek as I'm sure you guessed, I'll take it out now -> :-P

However, I think that the point (that somebody made earlier in the
thread IIRC) that DLLs are likely to become a major stumbling block in
MS clusterware is probably correct, at least in part for the reasons
given above.  Even deciding how/what to install on a "Windows Cluster
Node" will be a nontrivial problem.  Windows has fairly tight coupling
between their software and the GUI, to say the least.  Their entire
software development process at this point is graphical, and I suspect
that a whole lot of the I/O streams one might use (or have used in
already existing code being ported) are linked out of necessity to DLLs
that are intrinsic to the GUI -- keyboard, mouse, "stdout/stderr", all
have been wrapped/encapsulated.

So then how do you manage "a node"?  Install the full GUI-based OS on
it, but run it without a keyboard, mouse, video out?  What does the OS
do without KVM at all?  What do DLLs that presume that you HAVE a KVM do
when you don't, or do you insist that every node be KVM equipped?  Even
under linux this has been problematic in the past -- it didn't used to
be terribly easy to run a box with no KVM and is still likely difficult
not to have at least V for a "console".

Fine, so your app uses a library that generates an error, the error
WANTS to pop up an error message inside a window on your screen with a
mouse-driven "close" button, but there ain't no screen.  What happens?
This isn't just a MICROSOFT problem -- the DLL might be your own,
developed for your application when it was single threaded, single CPU,
running on a system that by assumption HAS a KVM and windowing interface
running.  Moving software DEVELOPED under Windows from Windows to
Windows Cluster might well be more difficult than moving software from
Linux (cluster or not) to Windows Cluster!

Again, the issue of dependency is key.  In linux, such an GUI-based app
would have a dependency on certain X libraries.  Those libraries could
in turn be loaded independent of whether or not the OS core was running
X (and could still lead to problems, obviously).  However, the problems
will be MUCH greater on a system that doesn't really have the equivalent
of an "init 3" state, a system that cranks up a whole virtual copy of
Windows in order to run a "remote desktop" right now.  Windows is still
"infinitely behind" X when it comes to distributed networked computation
of the most primitive sort -- just running a trivial application on
system A and having its GUI appear on system B -- AND has the problem
that it doesn't really grok command line software any more so that all
applications that have been developed for Windows in the last five or
ten years (solidly post DOS days) will require a SERIOUS and DIFFICULT
port performed by programmers that are absolutely clueless about writing
code without a GUI ABI built right in.

I think it will be downright amusing, if you think that other people's
pain and suffering is amusing.  I do, kinda.  At least if they are using
Windows on purpose...;-)

    rgb

>
>>
>>     rgb
>>
>>>
>>>>
>>>> Ashley,
>>>
>>>
>>
>
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list